Darwin Core Archive

Darwin Core Archive (DwC-A) is the standard for publishing biodiversity data using Darwin Core terms. It is the preferred format for publishing data in OBIS and GBIF. The conceptual data model of the Darwin Core Archive is a “star schema” with a core record, such as an occurrence or an event, as the center of the star. Extension records, radiating out of the star, can optionally be associated with the core, linked by database keys such as an ID colum. This means that there is only one core file and (optionally) linked to multiple extension files. So the entire schema is only two levels deep: a single core with zero, one, or many extensions. Each core-to-extension relationship can be one-to-one, where there is only one extension record for each core record - also called “Simple Darwin Core”, or one-to-many, where for example many environmental or biometric measurements and/or many biological occurrence records, can be associated with a single sampling event.

OBIS-ENV-DATA, OBIS holds more that just species occurrences

Data collected as part of marine biological research often include measurements of habitat features, such as physical and chemical variables of the environment, and biometric measurements (such as body size, counts, abundance and biomass combined, etc) as wel as details regarding the nature of the sampling or observation methods, equipment, and sampling effort.

In the past, OBIS only dealt with Occurrence Core and additional measurements were added in a structured format (e.g., JSON) in the DwC term dynamicProperties. This was far from ideal (difficult format, no standardisation of terms and difficult to extract).

With the release and adoption of a new Core type: Event Core, OBIS can now go beyond species occurrence based records, and make the sampling event the central data entity linking biological, environmental, and sampling information to the appropriate event level using the occurrence Extension and the ExtendedMeasurementOrFact Extension.

ExtendedMeasurementOrFact Extension (eMoF)

As part of the IODE pilot project: Expanding OBIS with environmental data OBIS-ENV-DATA, OBIS introduced a customized ExtendedMeasurementOrFact Extension or eMoF, which extends GBIF’s DwC MeasurementOrFact Extension with 4 new terms: occurrenceID, measurementTypeID, measurementValueID and measurementUnitID.

Figure: Overview of an OBIS-ENV-DATA format. Sampling parameters, abiotic measurements, and occurrences are linked to events using the eventID (full lines). Biotic measurements are linked to occurrences using the new occurrenceID field of the ExtendedMeasurementOrFact Extension (dashed lines).

The eMoF Extension is used in combination with the Event Core and the Occurrence Extension to capture both abiotic measurements and biotic measurements. The occurrenceID is used to link biotic measurements in the eMoF Extension with the the Occurrence Extension and the eventID links the eMoF to the Event Core (which is necessary in a star schema where all records in extensions must link to the Core file). Abiotic measurements as well as sampling facts in the eMoF are linked to the Event Core throuth the eventID (for those records no occurrenceIDs are needed). So the eMoF extension is used to store:

• organism quantifications (e.g. counts, abundance, biomass, % live cover, etc.)
• species biometrics (e.g. body length, weight, etc.)
• facts documenting a specimen (e.g. living/dead, behaviour, invasiveness, etc.)
• abiotic measurements (e.g. temperature, salinity, oxygen, sediment grain size, habitat features)
• facts documenting the sampling activity (e.g. sampling device, sampled area, sampled volume, sieve mesh size).

The MoF terms: measurementType, measurementValue and measurementUnit are completely unconstrained and can be populated with free text annotation. While free text offers the advantage of capturing complex and as yet unclassified information, the inevitable semantic heterogeneity (e.g. of spelling or wording) becomes a major challenge for effective data integration and analysis. Hence, OBIS added 3 new terms: measurementTypeID, measurementValueID and measurementUnitID to standardise the measurement types, values and units. Note that measurementValueID is only used for standardizing sampling facts (e.g. sampling instrument) and not measurements. The 3 new terms should be populated using controlled vocabularies referenced using Unique Resource Identifiers (URIs). OBIS recommends to use the internationally recognized NERC Vocabulary Server, developed by the British Oceanographic Data Centre (BODC), which can be searched through https://www.bodc.ac.uk/resources/vocabularies/vocabulary_search/.

Measurement or Fact vocabulary

For an overview of the most common parameters in OBIS, linked to the proposed BODC vocab term see Measurement or Fact vocabulary. In case of missing terms, below are the vocabularies to be used:

OBIS-ENV-DATA and Darwin Core terms

The DwC terms that are most relevant to OBIS, organized in the OBIS-ENV-DATA format, are the following (those in italics are mandatory):

Event Core

eventID, parentEventID, eventDate, habitat, minimumDepthInMeters, maximumDepthInMeters, decimalLatitude, decimalLongitude, coordinateUncertaintyInMeters, footprintWKT, modified

Occurrence Extension

eventID, occurrenceID, scientificName, scientificNameAuthorship, scientificNameID, kingdom, taxonRank, identificationQualifier, occurrenceStatus, basisOfRecord, modified

Extended MeasurementorFact Extension

measurementID, eventID, occurrenceID, measurementType, measurementTypeID, measurementValue, measurementValueID, measurementUnit, measurementUnitID, measurementAccuracy, measurementRemarks

When to use Event Core

• When the dataset contains abiotic measurements, or other biological measurements which are related to an entire sample (not a single specimen)
• When specific details are known about how a biological sample was taken and processed. These details can be expressed using the eMoF and the newly developed Q01 vocabulary.

Event Core should be used in combination with the Occurrence Extension and the eMoF.

When to use Occurrence Core

• No information on how the data was sampled or samples were processed.
• No abiotic measurements are taken or provided
• Biological measurements are made on individual specimens (each specimen is a single occurrence record)
• This is often the case for museum collections, citations of occurrences from literature, individual sightings.

Datasets formatted in Occurrence Core can use the eMoF Extension for biotic measurements or facts.