EarthCube is an evolving socio-technical system that enables interactions between geo, computational, and information scientists to advance geoscience research and discovery, by reusing existing, developing new, and integrating geoscience cyberinfrastructure resources for the geosciences (TAC Architecture WG roadmap). The EarthCube system must capture and articulate technical capabilities useful to or required by the geoscience research community for system managers and funding authorities to direct evolution of the system. To improve the efficiency and effectiveness of research activities, the technology needs to match technical capabilities and expertise, on the one side, with needs of geoscientists expressed through a variety of use cases and research scenarios, on the other side. This matchmaking system is the technical essence of EarthCube. In a broader sense, EarthCube will also include the set of technical capabilities themselves, and provide mechanisms for managing and maintaining them through their lifecycles.
Databases, software services, and community facilities must be built on the existing foundation of cyberinfrastructure and community knowledge created over the past two decades. This will require years of development, investment, and community engagement to foster convergence towards an integrated system that supports the entire geosciences community. The success of EarthCube will depend on identification of solutions and best practices within the current infrastructure, and strategic testing and adoption of new technology and approaches. This can only occur through an evolutionary process in which dialogue and collaboration between system developers and system users identify and select the 'fittest' solutions. The conceptual design we present here takes a complex federation of systems approach, focused on key requirements to support evolution of community practice, emergence, and adaptability in the face of technological innovation, addressing both technical and social considerations.
As a socio-technical federation of systems, EarthCube will be immersed in an existing network of systems. The primary stakeholders in EarthCube are Earth Science academic researchers funded by the National Science Foundation. Members of this community typically need to meet teaching and operational demands of a university environment as well as conduct research under the auspices of various sources of funding. This community must interoperate with various other communities across fuzzy boundaries. Much information for the Earth Science enterprise is collected and managed by government agencies that have various mandates and are subject to changes in political priorities. The consumers of scientific research results for business purposes include government agencies engaged in regulatory or resource management and private enterprise seeking return on investment. Each of these communities is itself a heterogeneous collection of entities and systems for which there is no centralized management. In order to be successful, EarthCube must be engineered to promote interoperability and synergy between all of these communities by recognizing and building upon their shared interests.
Analogies are useful for understanding complex concepts. Two analogies are salient in our thinking about EarthCube—a marketplace and a workshop. The analogy of EarthCube as a market place is based on the concept of product providers and product consumers. The products in this market include information (datasets), technology (software, experimental practice), and expertise. Many of the products in the market place will be available to consumers without cost (‘free’) because they are the output of public-funded projects with requirements for open access to the products. The economic benefits to producers in this market are thus indirect, in the form of something like ‘reputation’ that is garnered in systems like Quora, Stack Overflow, or Yahoo Answers. In the funded-research arena, the benefit of accruing reputation is increased success in obtaining funding, greater job security, and personal recognition/social status. The cyberinfrastructure is the technology necessary to support operation of the marketplace, but the products available in the market, and choices about which products will be used are made independently by the producers and consumers participating in the market. Regulations overlain on the market by funding providers or other authorities might be significant factors in these decisions.
EarthCube provides a workshop for finding, building, and using software and data to do Earth Science research. The workshop contains tools (software, data) that are ready to use, and a warehouse loaded with materials for making things. The workshop also provides a collection of tools and practices used to construct new tools if there is not one available already for a particular purpose, and for adding new materials to the warehouse stock. There are a variety of general purpose tool ‘frameworks’ like drill-presses, dremel tools, and lathes that can perform many different functions based on interoperable ‘plug-in’ attachments. The culture of this workshop is to reuse an existing tool or practice, with material already in the warehouse if applicable. If something new is developed or acquired, that is done with the intention of following existing patterns to make the tool or material available for a future worker to use. A hammer, saw, nails and wood might be necessary, but are not sufficient, to build a house – a construction worker has to know how to use these things in order to make that happen.
To reiterate, EarthCube is a managed platform for matching technical capabilities and expertise with needs of geoscientists expressed through a variety of use cases and research scenarios. Such match-making platforms present a new business model in a growing number of fields, for example shopping (Amazon and eBay), transportation (Uber), and relationships (eHarmony), with markets emerging in other fields including education and health care. The foundation of such platforms is a virtual marketplace where different types of capabilities are advertised, matched with needs (e.g. transportation or housing requests), and the quality of experience is evaluated (with reviews, reputation points, scores, etc.). The science enterprise is similarly going through fundamental changes in how it operates, with re-assessment of what constitutes science products, success metrics, re-use of science products and knowledge. To adapt to evolving technology and social expectations, the science enterprise needs a new model for infrastructure that combines scientific, technical, organizational and other considerations.
Our view of EarthCube (Figure 1) centers on the notion of supporting a geoscience technical resource marketplace, where available resources, capabilities and expertise, and research needs of geoscientists expressed through use cases, scenarios and success stories, are presented. Matching existing resources with research needs implies that the former are validated and certified for use, and can be combined into various resource compositions (a.k.a. “architectures”) to address specific needs. The marketplace should be managed to ensure that emerging needs and the current feasibility of addressing them are analyzed, and development priorities are established to guide the evolution of the system. While many core technical capabilities have been implemented, we argue that these capabilities are not reaching their full potential is the absence of an efficient system for expressing the needs of geoscientists and matching such needs with capabilities and expertise. Research needs and technical capabilities are constantly evolving, but this advertising and match-making platform would remain the central “invariant” core of the EarthCube system. Technical capabilities would be registered in this platform, and the list of accessible capabilities (data, software, expertise, processing resources, services…) will evolve with the evolution of the geoscience research agenda and enabling technologies. Thus, the actual technical capabilities to be implemented cannot be prescribed at this level of design and time horizon.
The roadmap for transition of existing infrastructure to this new model includes both adapting existing infrastructure components and establishing new infrastructure. The strategy for EarthCube development should focus on developing the marketplace and match-making platform. A foundation of the platform will be specifications for clearly articulated and machine-processable statements of user needs in research scenarios, and for description of functional capabilities and interfaces of technologic components. These descriptions are a prerequisite for matching requirements and capabilities, and for automating the assembly of workflows that chain existing components in new ways. Such designs cannot be prescribed, but an efficient marketplace will make their rapid development possible. Description of this conceptual approach represents the core innovation introduced by the GEAR team. It has been expressed at all recent presentations starting in March 2015.
The core elements of the EarthCube technology infrastructure are factored into two major groups of IT-focused abstract components that support the user applications that are used for day-to-day research and learning activities. These IT components include Content-focused Components, and the Management and Operations Components. The system component model must be a dynamic artefact that can be extended or transformed to account for new workflows by specifying new building blocks or interfaces. The barriers to participation should be low so that requirements can be suggested even by a novice. A key aspect of EarthCube as a federation of systems will be specification of the interfaces that link abstract components and provide guidance for the implementation of new components.
The content-focused components include closely related Resource and Registry systems and the Data Processing System. The Resource System manages information items that describe entities and activities in the world -- geologic features, observations, laboratory measurements, processing methods, system specifications, etc. The Registry system manages information items that document content in the Resource system or are resources like vocabularies, directories of people, projects, organizations etc. that are 'asserted', i.e. given as facts and intended for shared use populating data or metadata records.
Figure 2.EarthCube conceptual design. All components and resources are products in a marketplace, including those that operate the marketplace. The invariants are the information exchanges that link the components and resources and allow them to interoperate.
The Data Processing System supports components used for the actual analysis and visualization of data. It is the foundation for designing new workflows and supporting repeatable science through reproducible data processing using composition of EarthCube components and workflows. User applications developed for specific research projects are included in this system. This system accesses the registry and repository system for component discovery and documentation-‘shopping’ in the marketplace analogy. More importantly, it provides the workbench/workspace support for the use of EarthCube resources in science research workflows.
The System Management and Operations Components implement a heterogeneous collection of capabilities for community support, usage analytics, system monitoring, conformance testing, and resource marketplace operations.
The User-Facing Applications, the analytical, visualization, resource discovery, resource access, scholarly communication, etc. components that scientists interact with on a daily basis in the course of research, are products registered in the marketplace, developed under the auspices of funded projects, with life cycles determined by project lifetime and possible subsequent adoption for wider use and maintenance. All the components and resources in this scheme are products in the marketplace, including those that enable operation of the marketplace. The EarthCube community of practice defines the framework for developing these resources following practices that promote their interoperability, reusability, and future value. What is invariant and defines EarthCube are the specifications that define these practices—particularly:
How resources are documented and registered with the system (added to the inventory),
The information exchanges adopted for accessing the inventory and matching resources with requirements
The information exchanges used to collect and report on resource impact, user satisfaction, missing capabilities, and resource usage.
A shared high level information model that enables cross-domain communication and discovery.
The actual components and interfaces that implement these practices will change over time as technology evolves.
Figure 3. Infrastructure components for EarthCube. The Registry, Catalog, and Monitoring subsystems support the operation of the resource market place. Research assemblies of software and data resources have life cycles dependent on usage and impact-based funding