Community Inventory of EarthCube Resources for Geosciences Interoperability
Finding appropriate data is a difficulty that has been most often articulated by geoscientists during EarthCube end-user workshops. It becomes especially challenging when researchers work on interdisciplinary problems. Researchers finding and interpreting data across domain boundaries have to deal with unfamiliar terminology and research designs, implicit measurement assumptions, and disparate metadata and data formats. Despite the wealth of geoscience information available in digital form, and a plethora of databases, services and data portals already developed, there is no single inventory of available information across domains. The goal of CINERGI is to compile such an inventory, developing mechanisms to ensure that different resources have consistent and easy-to-interpret descriptions, traceable origins, and documentation that is as complete as possible. The scope includes datasets commonly catalogued by many organizations, as well as documentation for catalogs, vocabularies, data services, process models, repositories, etc. This inventory will help researchers answer both relatively simple and complex queries - in the latter case possibly requiring several iterations and a link to a domain data catalog for additional search options.
Compiling and curating a large inventory of geoscience information resources requires integration of metadata records from standards-compliant catalogs maintained by domain data facilities and large projects, and information about data sources that are used and/or generated by multitude of smaller research projects, typically referred to as the “long tail of science”. While there are a relatively limited number of protocols for harvesting metadata from such catalogs, we found little consistency in metadata content. To address this challenge, we are developing a CINERGI metadata processing pipeline: metadata documents are harvested using a number of adapters, loaded into a staging database, validated against content standards, and then processed to improve metadata content before being republished via a standard interface. Metadata enhancements include checking and validating spatial extent, assigning an extent based on available information if applicable; analyzing and adding keywords to make the metadata easier to discover across domains; making the dataset title more descriptive; correcting temporal extent as needed; validating organization names against standard vocabularies; and adding standard thematic category and resource type classification terms. As the enhancers change the content of the record, a corresponding provenance record is being created and made accessible via CINERGI search interface.
Assembling and validating a large collection of geoscience metadata cannot be done without direct involvement of many groups of geoscientists, who specify which data resources are important for members of their domain, which metadata elements are important to expose for cross-disciplinary search, and validate assembled metadata and query responses. This engagement comes in several forms: (1) working with EarthCube Research Coordination Network (RCN) projects to jointly assemble resources used in their domains and make them searchable through the CINERGI system, (2) describing and registering resources mentioned by geoscientists in the course of EarthCube end-user workshops, in responses to EarthCube member surveys, and appearing in similar inventories; (3) interacting with managers of domain data facilities, (4) registering resources developed by EarthCube partners, and (5) exploring more complex query scenarios through collaboration with several geoscience researchers – in paleogeology, hydrology, and critical zone science.
Benefits to Scientists
CINERGI will reduce the burden of finding, interpreting and evaluating fitness-for-use of different types of information resources, across geoscience domains. A number of geoscience data facilities and projects - in geochemistry, hydrology, ocean sciences, ecology and other fields - have developed excellent data repositories and metadata catalogs: CINERGI will enable accessing them via a single standards-based catalog interface, and improve metadata descriptions to make data discovery more uniform and less time consuming.
There are multiple ways to access or contribute to the CINERGI inventory list, choose the one that is best for you. The following is a selection of the newest interfaces. The application is in development, and changes are expected. To see all registry viewers that have been developed, including legacy Silverlight-based interfaces, see the CINERGI viewer page.
- CINERGI Resource Inventory (a large, broad, inventory of resources harvested from catalogs and community contributions), see metadata documents after enhancement through the CINERGI pipeline. The metadata search portal is under construction, and its content is constantly changing as we refine the underlying ontology, update the metadata processing pipeline, and bring in more harvested data.
- CINERGI Viewers: see them on a separate page. The main resources are also listed below:
- High-level Resource Catalog: a continuously updated collection of information resources of different types suggested by geoscientists: High Level Resource Catalog
- CINERGI Community Resource Viewers: community-built, domain-specific viewers for searching, updating and expanding community resource catalogs, a product of our joint work with several Research Coordination Network projects):
- Create a Resource Viewer for your community (this lets you download a template for assembling community resources)
- C4P resources: Template in Google spreadsheet, open for community contributions; associated HTML5 Viewer in a custom SuAVE (Survey Analysis via Visual Exploration) application
- SEN resources: Silverlight; HTML5 Viewer
- ECOGEO resources: Template; HTML5 Viewer
Additional Component Inventories
- EarthCube Member Connections: Member Connections, EarthCube membership database, with editable research interests and other characteristics.
- CINERGI organization github: http://github.com/cinergi
- CINERGI APIs page
- Towards a conceptual Design of a Cross-Domain Integrative Information System for the Geosciences (2013 AGU) [pdf]
- Inventorying and Assessing Cyberinfrastructure Readiness for Cross-Domain Information Re-use in the Geosciences: a Perspective from Hydrology (2013 AGU) [pdf]
- CINERGI: Presentations at C4P webinars: Feb 4, 2014 [pdf], March 3, 2015 [pdf]
- ESIP meeting presentations on CINERGI: January 6, 2015 [pdf], July 16, 2015 [pdf]
- CINERGI poster at EGU, April 2014 [pdf]
- CINERGI - technical presentation (All-Hands, June 2014) [pdf]
- Domain Registries of Information Resources: a How-To for Your Community All-Hands, June 2014) [pdf]
- Design of Community Resource Inventories as a Component of Scalable Earth Science Infrastructure: Experience of the EarthCube CINERGI Project (AGU'2014 presentation, IN24A-09, Dec. 2014) [pdf]
- CINERGI abstract for presentation at the RDA 5th plenary, March 2015 [pdf]
- Metadata validation in CINERGI, at OGC TC, March 2015 [pdf]
- CINERGI presentation at the Tech Hands Meeting (April 2015) [pdf]
- Highlights for Earthcube booklet, May 2015 [pdf]
- CINERGI Posters at the All-Hands Meeting (May 2015): project overview [pdf], geoscience use cases [pdf]
- CINERGI lightning talk (All-Hands, May 27, 2015) [pdf]
- Discovery use cases session (All-Hands, May 27, 2015): session presentation and video fragments [folder]
- CINERGI briefing for the CDF (All-Hands, May 27, 2015) [pdf]
- Building a registry of geoscience resources (All-Hands, May 28, 2015) [pdf]
- CINERGI presentation on the joint demo with BCube and GeoWS projects, Aleutian Volcano scenario (All-Hands, May 28, 2015) [pdf]
Student posters. In summer 2014 we hosted 6 high school students from San Diego who worked on various aspects of CINERGI metadata compilation: Anoushka Bose, Cole Pavelchek, Nick Lograsso, Erica Liu, Amar Haqqi, and Grace Chen. This work was supported by our undergraduate interns Raquel Calderon, Azfar Alam and Nick Nizhnikov. The REHS (Research Experience for High School) posters about these projects are [pdf], [pdf], [pdf]. In 2015, three new high school students worked on CINERGI during the summer: Divya Mohan, Ibrahim Ali and Edric Xiang. See their posters, focused on machine learning for better named entity recognition, and on a metadata crawler, here: [pdf], [pdf]. This work was supported by our undergraduate interns Alice Giliarini, Aaron Gong and Adam Schachne.
- Leslie Hsu, Lamont-Doherty Earth Observatory, Columbia University
- David Valentine, San Diego Supercomputer Center/UCSD
- Tom Whitenack, San Diego Supercomputer Center/UCSD
Additional project members
- Chris Condit, UCSD
- Burak Ozyurt, UCSD
- Leah Musil, AZGS
- Kai Lin, UCSD
- Raquel Calderon, UCSD
- Azfar Alam, UCSD
- Nick Nizhnikov, SDSU
- Alice Giliarini, UCSD
- Adam Schachne, UCSD
- Aaron Gong, UCSD
SDSC REHS (Research Experience for High School) interns, Summer 2014
- Anoushka Bose
- Cole Pavelchek
- Erica Liu
- Grace Chen
- Nick Lograsso
- Amar Haqqi
SDSC REHS interns, Summer 2015
- Edric Xiang
- Divya Mohan
- Ibrahim Ali