Achieving information integration in any field of science requires understanding the flow of data between production and use, and then engineering one or more solutions from the principles of information computer science and the available technologies. It also entails social engineering to ensure acceptance and sustain implementation.
In this presentation I will review some of the major data production patterns in biodiversity science, contrast the ways that larger communities work, and how these differences have shaped the largest resources developed for each community. I will describe the development of data integration in organismal biodiversity, highlighting some of the milestones and cultural transformations that were necessary to create a global infrastructure. A critical element is the partnership between the Global Biodiversity Information Facility (GBIF) and Biodiversity Information Standards (TDWG).
As tools and methods have evolved, the biodiversity informatics community has gotten more sophisticated with expressions of data design and its processes. We have a sustainable platform for developing standards and a Standards Documentation Specification (SDS). Progress in mobilizing data has been steady, but there haven’t been revolutionary changes since the deployment of GBIF’s Integrated Publishing Toolkit. Nevertheless, progress is happening in key areas.
Watch for: (a) more detailed vocabulary standards; (b) implementation of Linked Open Data principles through emerging resources (Persistent Identifiers, ORCIDs, Bloodhound); (c) a well founded approach to assessing and improving data quality (with tools); and (d) a reconciliation of competing standards, such as ABCD and DarwinCore.
About the Speaker
Dr. Stan Blum has been the principal analyst or contributor to the design of scientific databases for the National Museum of Natural History (Smithsonian), Museum of Vertebrate Zoology (UC Berkeley), University of Kansas (“Specify”), and the California Academy of Sciences. He has developed data standards that support data integration across natural history disciplines and organizations since 1990, including Vertebrate Paleontology, the Association of Systematics Collections (now Natural Science Collections Alliance), and the Taxonomic Databases Working Group (TDWG). Most significant among these is the DarwinCore, the most widely used standard for mobilizing organism occurrence data. The dominant theme in his career has been the integration of information across organizations and disciplines for the benefit of biodiversity science. Currently he serves as Administrator of the TDWG Secretariat in San Francisco.