Manual

Darwin Core

Introduction

Darwin Core is a body of standards for biodiversity informatics. It provides stable terms and vocubularies for sharing biodiversity data. Darwin Core is maintained by TDWG (Biodiversity Information Standards, formerly The International Working Group on Taxonomic Databases).

OBIS and Darwin Core

The OBIS schema was based on Simple Darwin Core, a subset of Darwin Core which does not allow any structure beyond rows and columns. It added some terms which were important for OBIS but not supported by Darwin Core at the time. OBIS is now transitioning to Darwin Core.

Darwin Core terms

This is an overview of the most important Darwin Core terms to consider when contributing to OBIS, with guidelines regarding their use. A spreadsheet template with all terms relevant for OBIS can be found here.

:exclamation: OBIS currently has seven required fields:

Taxonomy and identification

The following terms are related to scientific name:

The following terms are related to the identification:

scientificName should always contain the originally recorded scientific name, even if it is invalid. This is necessary to be able to track back records to the original dataset. The name should be at the lowest possible taxonomic rank.

We recommend to not include authorship in scientificName, and only use scientificNameAuthorship for that purpose.

A WoRMS LSID should be added in scientificNameID, OBIS will use this identifier to link the record to the accepted taxonomic name. Go to the namematching tool to find out how to get the LSIDs from WoRMS.

kingdom and taxonRank can aid us in identifying the taxon that scientificName refers to, and avoid linking to homonyms, although it is not necessay when a scientificNameID is provided.

OBIS recommends providing information about how an identification was made, for example by key, or by expert, or by on-board species guide; or by morphology vs. genomics, etc. Who made the taxonomic identification can go in identifiedBy and when in dateIdentified. Use the ISO 8601:2004(E) standard for date and time, for instructions see Time. A list of references, such as field guides used for the identification can be listed in identificationReferences. Any other information can be added to identificationRemarks.

:exclamation: In case of uncertain identifications, qualifiers such as cf. or aff. should go in identificationQualifier.

   scientificName   scientificNameAuthorship                          scientificNameID   taxonRank identificationQualifier
----------------- -------------------------- ----------------------------------------- ----------- ----------------------
Lanice conchilega               Pallas, 1766 urn:lsid:marinespecies.org:taxname:131495     species 
            Gadus             Linnaeus, 1758 urn:lsid:marinespecies.org:taxname:125732       genus             cf. morhua

Occurrence

occurrenceStatus is an important term, because it allows us to distinguish between presence and absence records. We recommend to always fill in this field and to use present or absent.

A few terms related to quantity, organismQuantity and organismQuantityType, have been recently added to Darwin Core. This is a lot more versatile than the older individualCount field. organismQuantity should contain the quantity value, and organismQuantityType the parameter and units. There is a recommended vocabulary for organismQuantityType which includes values such as individuals, biomassAFDG (biomass ash free dry weight in gram), percentageOfBiomass and percentageCoverage. The quantity terms should be used together with the new sample size related fields.

For stored specimens, the catalogNumber and preparations term can be used to provide the identifier for the record in the collection and to document the preparation and preservation methods.

Both associatedMedia, associatedReferences and associatedSequences are global unique identifiers or URIs pointing to respecitively associated media (e.g. online image or video), associated literature (e.g. DOIs) or genetic sequence information (e.g. GenBANK ID).

The recommended vocabulary for sex can be found here.

eventID     scientificName   occurrenceStatus   organismQuantity   organismQuantityType 
------- ------------------ ------------------ ------------------ ---------------------- 
      1          Abra alba            present                 12              organisms 
      1  Pectinaria koreni            present                 48              organisms 
      2          Abra alba             absent                  0              organisms 
      2  Pectinaria koreni            present                 48              organisms 

Record level terms

basisOfRecord is a required field and specifies the nature of the record. Possible values include PreservedSpecimen, FossilSpecimen, LivingSpecimen, HumanObservation, and MachineObservation.

institutionCode identifies the institution which owns the data, collectionCode identifies the collection or dataset within that institute. Collections cannot belong to multiple institutes, so all records within a collection should have the same institutionCode. The catalogNumber is an identifier for the records within the dataset or collection.

occurrenceID should be globally unique. A globally unique identifier could for example be constructed from the institutionCode, the collectionCode and the catalogNumber:

institutionCode   collectionCode   catalogNumber   occurrenceID
--------------- ---------------- --------------- --------------
           VLIZ             NSBS             123   VLIZ_NSB_123 
           VLIZ             NSBS             456   VLIZ_NSB_456 

bibliographicCitation allows for providing different citations on record level, while a single citation for the entire dataset needs to be added to the metadata.

modified is the most recent date-time on which the resource was changed. It is required to use the ISO 8601:2004(E) standard, for instructions see Time.

dataGeneralizations refers to actions taken to make the shared data less specific or complete than in its original form. Suggests that alternative data of higher quality may be available on request.

Location

Occurrence coordinates should be provided in decimal degrees on the WGS 84 (EPSG:4326) geodetic datum, along with coordinateUncertaintyInMeters, which is the smallest circle around the given decimalLatitude and decimalLongitude containing the whole location.

The spatial reference system of decimalLatitude and decimalLongitude should be documented in geodeticDatum. Recommended best practice is use the EPSG code. Coordinates in degrees/minutes/seconds can be converted to decimal degrees using our coordinates tool. We also provide a tool to check coordinates or to determine coordinates for a location on a map. This tool also allows geocoding location names using marineregions.org.

If the locality of an occurrence is known but not the exact coordinates, we need to use a geocoding service to obtain coordinates. Marine Regions has a search interface for geographic names, and provides coordinates as well as a map of the location. Another option is to use Google Maps: after looking up a location, the decimal coordinates can be found in the page URL.

A Well-Known Text (WKT) representation of the shape of the location can be provided in footprintWKT. This is particularly useful for tracks, transects, tows, trawls, or when an exact location is not known. WKT strings can be created using our WKT tool. This tool also calculates a midpoint and a radius, which can be used for decimalLongitude, decimalLatitude, and coordinateUncertaintyInMeters.

Some examples of WKT strings:

LINESTRING (30 10, 10 30, 40 40)
POLYGON ((30 10, 40 40, 20 40, 10 20, 30 10))
MULTILINESTRING ((10 10, 20 20, 10 40),(40 40, 30 30, 40 20, 30 10))
MULTIPOLYGON (((30 20, 45 40, 10 40, 30 20)),((15 5, 40 10, 10 20, 5 10, 15 5)))

Keep in mind while filling in minimumDepthInMeters and maximumDepthInMeters that this should be the depth at which the sample was taken and not the water column depth at that location.

locationID is an identifier for the set of location information (e.g. station ID, MRGID from marineregions).

Event

eventID is an identifier for event, i.e. something that happened at a certain place and time. parentEventID is an identifier for a parent event, which must refer to an existing eventID. eventRemarks can hold info on cruise, expedition, research vessel, station etc. habitat is a category or description of the habitat in which the Event occurred.

Time

The date and time at which an occurrence was recorded goes in eventDate. This term uses the ISO 8601 standard. OBIS recommends using the extended ISO 8601 format with hyphens.

ISO 8601 dates can represent moments in time at different resolutions, as well as time intervals which use / as a separator. Date and time are separated by T. Times can have a time zone indicator at the end, if this is not the case then the time is assumed to be local time. When a time is UTC, a Z is added. Some examples of ISO 8601 dates are:

1973-02-28T15:25:00
2005-08-31T12:11+12
1993-01-26T04:39+12/1993-01-26T05:48+12
2008-04-25T09:53
1948-09-13
1993-01/02

Besides year, month and day numbers, ISO 8601 also supports ordinal dates (year and day number within that year) and week dates (year, week, and day number within that week). These dates are less common and have the formats YYYY-DDD (for example 2015-023) and YYYY-Www-D (for example 2014-W26-3).

ISO 8601 durations should not be used.

Sampling

sampleSizeValue and sampleSizeUnit are very important when a organism quantity is specified. Recommended best practice is to use SI units or non-SI units accepted for use with SI for the sampleSizeUnit. Examples are litre, square metre and cubic centimetre.

For example, in the case of a macrofauna sediment core and meiofauna subsamples:

parentEventID   eventID           scientificName   eventDate   sampleSizeValue      sampleSizeUnit 
------------- --------- ------------------------ ----------- ----------------- ------------------- 
                      1                Abra alba  2015-10-02               0.5        square metre 
                      1        Lanice conchilega  2015-10-02               0.5        square metre 
            1         2       Sabatieria pulchra  2015-10-02                10   square centimetre 
            1         2   Leptolaimus sebastiani  2015-10-02                10   square centimetre 
            1         3     Pselionema longiseta  2015-10-02                10   square centimetre 
            1         3       Pselionema simplex  2015-10-02                10   square centimetre 

Darwin Core Archive

Darwin Core Archive (DwC-A) is a standard for publishing biodiversity data using Darwin Core. Darwin Core archives contain text files which are logically arranged in a star schema. This means that there is one core file and (optionally) multiple extensions files. Core files contain information on taxa, occurrences, or sampling events.

There are a variety of extension types. Often used extensions are the Occurrence extension (which can be used with an Event core) and the MeasurementOrFacts extension.

Archives with an Event core will be supported in the near future. With an Event core, some properties can be moved from the occurrence to the event level and no longer have to be repeated for every single occurrence. As each event can point to a parent event (with the parentEvent field), extensive hierarchies of events can be constructed were different fields are only filled in at the appropriate level (for example: cruise > leg > station > sample > subsample).

Archive descriptor

The meta.xml descriptor file maps the core and extensions files to Darwin Core terms, and describes how the core and extensions files are linked.

<archive xmlns="http://rs.tdwg.org/dwc/text/" metadata="eml.xml">
  <core encoding="UTF-8" fieldsTerminatedBy="\t" linesTerminatedBy="\n" fieldsEnclosedBy="" ignoreHeaderLines="1" rowType="http://rs.tdwg.org/dwc/terms/Event">
    <files>
      <location>event.txt</location>
    </files>
    <id index="0" />
    <field index="1" term="http://rs.tdwg.org/dwc/terms/eventID"/>
    <field index="2" term="http://rs.tdwg.org/dwc/terms/parentEventID"/>
    <field index="3" term="http://rs.tdwg.org/dwc/terms/decimalLatitude"/>
    <field index="4" term="http://rs.tdwg.org/dwc/terms/decimalLongitude"/>
  </core>
  <extension encoding="UTF-8" fieldsTerminatedBy="\t" linesTerminatedBy="\n" fieldsEnclosedBy="" ignoreHeaderLines="1" rowType="http://rs.tdwg.org/dwc/terms/Occurrence">
    <files>
      <location>occurrence.txt</location>
    </files>
    <coreid index="0" />
    <field index="1" term="http://rs.tdwg.org/dwc/terms/basisOfRecord"/>
    <field index="2" term="http://rs.tdwg.org/dwc/terms/occurrenceID"/>
    <field index="3" term="http://rs.tdwg.org/dwc/terms/scientificName"/>
  </extension>
  <extension encoding="UTF-8" fieldsTerminatedBy="\t" linesTerminatedBy="\n" fieldsEnclosedBy="" ignoreHeaderLines="1" rowType="http://rs.tdwg.org/dwc/terms/MeasurementOrFact">
    <files>
      <location>measurementorfact.txt</location>
    </files>
    <coreid index="0" />
    <field index="1" term="http://rs.tdwg.org/dwc/terms/measurementType"/>
    <field index="2" term="http://rs.tdwg.org/dwc/terms/measurementValue"/>
    <field index="3" term="http://rs.tdwg.org/dwc/terms/measurementUnit"/>
    <field index="4" term="http://rs.tdwg.org/dwc/terms/measurementMethod"/>
  </extension>
</archive>

Metadata

The eml.xml file contains the dataset metadata in Ecological Metadata Language (EML) format. For instructions on how to enter the metadata go to EML.