Finalized:Thursday, June 4, 2015
Data Science for EarthCube 2015 Key Documents
Quote: "EarthCube’s short term objective is greater data availability to geoscientists; the long term objective is enhanced knowledge availability for society." (Source: 5. EarthCube: a bridge between data and knowledge). Both need Data Scientists doing Data Science Data Publications!
Objective: Data mine following the CRISP-DM standard with six steps to identify the data sets that can be analyzed and visualized.
Recommendation: NSF GEO needs a catalog of projects and their data sets that the NSF Big Data and Data Science Projects can work with to produce integrated, cross-discipline, products that are curated and archived.
Goal: This is the beginning of that effort for the EarthCube 2016 All Hands Meeting by Semantic Community and the Federal Big Data Working Group Meetup. This will demonstrate a federated architecture in the cloud like the recent Data Science for NIST Big Data Framework documents and uses cases in support of the Holdren Open Research Data Policy.
Key Results: A key data mining result was to find this table in the 7 Key Documents listed below
- EarthCube Charter
- Strategic Vision
- Science Strategic Plan (This contains Table 1)
- TAC Strategic Plan
- 2015 EarthCube Highlights
- EarthCube: Past, Present, and Future
- Dynamic Earth (Note that I had difficulty getting the many figures out of this PDF files)
These were all PDF files that were converted first to Word or directly from PDF to MindTouch so they would be integrated and searchable. A number of the URLs do not copy and work properly. One possibility is that they are to content moved to the archive now.
Table 1: Categorization of Building Block Projects (see text for complete description).Note: This does not display well here. See original at:
So EarthCollab is not doing anything and Geosemantics is doing two things! The majority are doing Sharing / Integrating so lets see what that has produced in the Earth Cube 2015 Highlights and The EarthCube Past, Present and Future 2014.
To*learn*best*prac.ces*of*sohware*and*data*sharing,*thirteen*members*of*the*Early*Career*Advisory*CommiRee* of*the*EarthCube*GeoSoh*project*are*publishing*a*Geoscience*Paper*of*the*Future*(GPF).**This*implies* publishing*all*the*sohware*and*data*used*to*produce*the*results*of*the*paper,*as*well*as*detailed*workflows*and*provenance*of*how*they*were*generated.**They*were*trained*by*GeoSoh*project*members*on*best*prac.ces*for*sohware*and*data*sharing,*open*source*sohware,*and*provenance.**The*papers*will*appear*in*a*special*issue*of*the*AGU*Earth*and*Space*Science*journal.*
****The*figure*includes*pictures*of*the*par.cipants,*workflow*and*provenance*diagrams,*and*some*of*the*visualiza.ons*of*their*results.**From*top*to*boRom,*leh*to*right:*Cedric*David,*NASA/JPL*(hydrology*modeling);*Ibrahim*Demir,*U.*Iowa,*(hydrology*sensor*networks);*Robinson*W.*Fulweiler,*Boston*U.* biogeochemistry*in*marine*ecology);*Jonathan*Goodall,*U.*South*Carolina*(hydrology*visualiza.ons);*Leif*Karlstrom,*U.*of*Oregon*(volcanic*vent*clustering);*Kyo*Lee,*NASA/JPL*(regional*climate*model*evalua.on);*Heith*Mills,*U.*Houston*(geochemistry*and*marine*microbiology);*JiHHyun*Oh,*NASA/JPL*(tropical*meteorology);*Suzanne*Pierce,*U.*Texas*Aus.n*(hydrogeology*for*decision*support);*Allen*Pope,*U.*Colorado*Boulder*(glaciology);*Mimi*Tzeng,* Dauphine*Sea*Lab*(ocean*fisheries);*Sandra*Villamizar,*U.*California*Merced*(river*ecohydrology);*Xuan*Yu,*U.*Delaware*(hydrologic*modeling).*
Note that I removed the </br>, but left the * and mispellings in the text.
Initial results of data data mining are shown in the slides below:
- Slide 8 Data.gov: Geoscience
- Slide 9 Global Change Master Directory: Geoscience
- Slide 10 Directorate for Geosciences: Data Policies
- Slide 11 Google Search for Geoscience Data Sets: University of Illinois Library
- Slide 12 Google Search for Geoscience Data Sets: Natural Resources Canada
- Slide 13 Google Search for Geoscience Data Sets: Australian Government
- Slide 14 Data Science for USGS Minerals Big Data
Two excellent Earth Cube Uses Cases from the 2015 All Hands Meeting that we will data mine are:
- The National Flood Interoperability Experiment: See Webinars and Resources at bottom of the page
- The 2006 Eruption of Augustine Volcano, Alaska: Wikipedia, USGS, Smithsonian Institution, and Alaskan Volcano Observatory
The Conclusions and Recommendations are:
- Dynamic Earth is a “living document” basis for a series of Earth Cube Data Science Data Publications that are:
- Transformative approaches and innovative technologies for heterogeneous data to be integrated, made interoperable, explored, and re-purposed by researchers in disparate fields and for myriad uses across institutional, disciplinary, spatial, and temporal boundaries.
- Two excellent Earth Cube Uses Cases from the 2015 All Hands Meeting were data mined:
- One can drill down from Worldwide-to-Alaskan-to Alaskan Augustine Volcano.
- One can drill down from National Flood Interoperability Experiment-to-Mid-Atlantic Hydro Regions-to-Catchments Shape File and Weighted Spreadsheet.
- Note: Unfortunately the NFEI-Hydro Regions do not include Alaska.
- This is a federated architecture for integrating heterogeneous data sources across multiple boundaries that can be replicated in a series of data science data publications that make the Dynamic Earth a “living data document”.
Data Science Data Publication for the NSF Geosciences Directorate: Dynamic Earth 1 http://semanticommunity.info/Data_Science/EarthCube_Data_Science_Publications http://semanticommunity.info/Data_Science/EarthCube_Data_Science_Publications/Key_Documents