Finalized:Monday, July 31, 2017
Author(s):Jiang, Yongyao, Yun Li, Chaowei Yang, Kai Liu, Edward M. Armstrong, Thomas Huang, David F. Moroni & Christopher J. Finch
It is challenging to find relevant data for research and development purposes in the geospatial big data era. One long-standing problem in data discovery is locating, assimilating and utilizing the semantic context for a given query. Most research in the geospatial domain has approached this problem in one of two ways: building a domain-specific ontology manually or discovering automatically, semantic relationships using metadata and machine learning techniques. The former relies on rich expert knowledge but is static, costly and labor intensive, whereas the second is automatic and prone to noise. An emerging trend in information science takes advantage of large-scale user search histories, which are dynamic but subject to user-and crawler-generated noise. Leveraging the benefits of these three approaches and avoiding their weaknesses, a novel methodology is proposed to (1) discover vocabulary-based semantic relationships from user search histories and clickstreams, (2) refine the similarity calculation methods from existing ontologies and (3) integrate the results of ontology, metadata, user search history and clickstream analysis to better determine their semantic relationships. An accuracy assessment by domain experts for the similarity values indicates an 83% overall accuracy for the top 10 related terms over randomly selected sample queries. This research functions as an example for building vocabulary-based semantic relationships for different geographical domains to improve various aspects of data discovery, including the accuracy of the vocabulary relationships of commonly used search terms.
Yongyao Jiang, Yun Li, Chaowei Yang, Kai Liu, Edward M. Armstrong, Thomas Huang, David F. Moroni & Christopher J. Finch (2017) A comprehensive methodology for discovering semantic relationships among geospatial vocabularies using oceanographic data discovery as an example, International Journal of Geographical Information Science, 31:11, 2310-2328, DOI: 10.1080/13658816.2017.1357819This material is based upon work supported by the National Science Foundation under Grant No. 1540998. Opinions, findings, conclusions or recommendations expressed are those of the authors and do not reflect the views of the NSF.