Many challenges hinder the seamless integration of models with data. These challenges compel scientists to perform the integration process manually. The primary challenges are a consequence of the knowledge latency between model and data resources and others are derived from inadequate adoption and exploitation of information technologies. Knowledge latency challenges increase exponentially when a user aims to integrate long-tail data (data collected by individual researchers or small research groups) and long-tail models (models developed by individuals or small modeling communities). We focus on these long-tail resources because despite their often-narrow scope, they have significant impacts in scientific studies and present an opportunity for addressing critical gaps through automated integration. The goal of this research is to develop a framework rooted in semantic techniques and approaches to support “long-tail” models and data integration.
Incorporation of semantics in data and models life cycle for advancing:
- Data-model integration: overcoming the semantic heterogeneity of the rapidly growing data and model collections, will allow their seamless integration.
- Data discovery: semantics will minimize the data discovery gap over the web, which is increasing tremendously and limits their reusability and interoperability.
- Data synthesis: linking data based on their information profile will minimize the complexity of data synthesis.
- Model-Model Coupling: Ensuring the semantic consistency of quantities exchanged between models and providing tools for the alignment of their information profiles is essential for crossdisciplinary model coupling.
The GeoSemantics framework will directly augment the multidisciplinary interaction between different geoscience communities. We are building on two existing technologies: (1) SEAD (Sustainable Environmental Actionable Data), and (2) CSDMS (Community Surface Dynamics Modeling System). We are also collaborating with on ongoing EarthCube initiatives including GeoSoft, ESB (Earth System Bridge), and SEN (Sediment Experimentalist Network), and eWELL (Workforce Education and Learning Library). We are building a flexible information system that is capable of increasing the interoperability of scientific data and models by:
- creating standard tools for associating descriptive information with data and models.
- allowance of crosswalks between Controlled Vocabularies.
- provision of a low-barrier technology for scientists, who are not expert in information systems to contribute their information or update the existing information.
- Motivations: Driving motivations for advancing the interoperability of data and models are: (i) increasing the productivity of scientists and research groups,(ii) repurposing quality ed resources for new research objectives, (iii) providing flexibility for interdisciplinary research and collaboration, and (iv) enhancing the quality of available resources.
- Vision: Support the semantic interoperability between the rapidly growing long-tail models and data resources, by using the Linked Data and micro-web services approaches.
- Goals: Development of a decentralized knowledge-based platform that allows semantically heterogeneous systems to interact with minimum human intervention.
- Design Concept: We are building a collaborative knowledge management system that ingests the available standards and supports the formalization of semantic definition for physical process across geoscience communities, and provides web services that allow the semantic mediation and matching between resources including data and models.
- Keywords: Data Discovery, Model-Data Integration, Semantic Interoperability, Linked Data, Micro-services Ontologies, Metadata
Geosemantics framework is a decentralized framework that combines the Linked Data and RESTful web services to annotate, connect, integrate, and reason about integration of geoscience resources. The framework allows the semantic enrichment of web resources and semantic mediation among heterogeneous geoscience resources, such as models and data.
- It uses micro-service architecture to close the semantic loop among data, models, and Controlled Vocabularies (CVs).
- It provides three sets of micro-services:
- Knowledge Integration Services (KIS), which ingests, registers, and checks-in Controlled Vocabularies and W3C standards to the framework’s Knowledge-base;
- Semantic Annotation Services (SAS), which annotates resources with their spatiotemporal context, variable, and provenance relationships, either by running automatic extractors based on the data files MIME type (e.g. GeoTIFF and CSV types) or by providing an interactive interface for manual annotation;
- Resource Alignment Service (RAS), which is a scientific workflow to align the attributes associated with two geo-resources to ensure their semantic consistency before integration.
Benefits to Scientists
- Advances the interoperability of model and data resources using semantic annotations
- Allows the cross walks between standard names using a collaborative knowledge management system.
- Augments the semantic mediation and matching between models and data with minimum human intervention.
- Kumar, P. (2015) "Hydrocomplexity: Addressing water security and emergent environmental risks." Water Resources Research 51, no. 7: 5827-5838.
- Elag, M.M., P. Kumar, L. Marini, S.D. Peckham (2015) Semantic interoperability of long-tail geoscience resources over the Web, In: Large-Scale Machine Learning in the Earth Sciences, Eds. A.N. Srivastava, R. Nemani and K. Steinhaeuser, Taylor and Francis (book chapter, accepted)
- Peckham, S.D. (2014a) The CSDMS Standard Names: Cross-domain naming conventions for describing process models, data sets and their associated variables, Proceedings of the 7th Intl. Congress on Env. Modelling and Software, International Environmental Modelling and Software Society (iEMSs), San Diego, CA. (Eds. D.P. Ames, N.W.T. Quinn, A.E. Rizzoli), http://www.iemss.org/sites/iemss2014/proceedings.php.
- Peckham, S.D. (2014b) EMELI 1.0: An experimental smart modeling framework for automatic coupling of self-describing models, Proceedings of HIC 2014, 11th International Conf. on Hydroinformatics, New York, NY. http://academicworks.cuny.edu/cc_conf_hic/464/)
- Kumar, P., and Elag, M. M. (2014) Geo-Semantic Framework for Integrating Long-Tail Data and Model Resources for Advancing Earth System Sciences. AGU Fall Meeting Abstract IN33A- 3758, San Francisco, CA, 2014
- Myers, J, Hedstrom M., Akmon D., Payette S., Plale, A. B., Kouper I., McCaulay, S., Kumar, P., Elag, M. M., et al. (2015) "Towards Sustainable Curation and Preservation: The SEAD Project’s Data Services Approach, Institute of Electrical and Electronics Engineers, IEEE 11th International e-Science Conference, Munich, Germany, doi: 10.1109/eScience.2015.56
- Kumar, P., Elag, M. M., Marini, L, Lui, R., Jiang, P., (2015) Envisioning a Future of Computational Geoscience in a Data Rich Semantic World, AGU Fall Meeting Abstract IN41C-1711, San Francisco, CA, 14-18 Dec., 201.
- Elag, M. M., and Kumar, P., Marini, L., Myers, J. D., Hedstrom, M., and Plale, A. B. (2014) Characterization of Emergent Data-Networks in Long Tail Collections. AGU Fall Meeting Abstract IN33C-05, San Francisco, CA, 15-19 Dec., 2014.
- Elag, M. M., Kumar, P., Marini, L, Lui, R., Jiang, P., (2015) SAS- Semantic Annotation Service for Geoscience resources on the web, AGU Fall Meeting Abstract IN41C-1711, San Francisco, CA, 14-18 Dec., 2015.
- Elag, M. M., and Kumar, P., Marini, L., Myers, J. D., Hedstrom, M, and Plale, A. B. (2014) Characterization of Emergent Data Networks among Long-Tail Data, Abstract Vol. 16, EGU2014-7844-1, 2014, EGU General Assembly 2014.
- Elag, M. M., Kumar, P., Marini, L, Lui, R., Jiang, P., (2015) Geo-Semantic Framework for Integrating Long-Tail Data and Model Resources for Advancing Earth System Science, EarthCube All-hands meeting, Arlington, VA, 26-28 May, 2015
- Elag, M. M., Kumar, P. (2015) GeoSemantic Framework for Integrating Data and Model Resources For Advancing Earth System Science, 3rd CUAHSI Conference on Hydroinformatics meeting, Tuscaloosa, Alabama, 14-17 July, 2015
- Jiang, P., Elag, M. M., and Kumar, P. (2015) Geosemantic Resource Alignment Service. CSDMS annual meeting, CO, 26-28 May, 2015.
- Elag, M. M., and Kumar, P. (2015) Semantic Annotation Framework for Long-tail Resources, Research Data Alliance Fourth Plenary Meeting, San Diego, CA, 6-10 March, 2015.