OBIS Schema implementation notes

Last updated on Mon, 2012-01-09 15:28. Originally submitted by evberghe on 2010-12-13 17:19.

On this page, we give some more guidance on the interpretation of the different elements of the OBIS Schema. These remarks should help in making sure that all data providers interpret the Schema in the sameway, and result in a better, more homogenous content of OBIS.

Date Last Modified

It is acceptable to enter a date only if the time is unknown
If the DiGIR data table contains elements with different modification times, enter the most recent time
If the modification date-time is unknown, enter the date-time of first "publication"

Collection Code and Institution Code

The Collection Code and the Institution Code can be the same in the case of Institutions that serve only one Collection. These can be full names instead of codes/abbreviations, if preferred.
The Collection Code (and/or the Field Number) can be hold concatenated Station and Expedition names/codes.

Catalog Number

The Catalog Number should be stable through time. So if a record is deleted, do not re-use the Catalog Number for a new record
Together, the Institution Code, Collection Code, and Catalog Number should uniquely identify a record

Scientific Name

If the record is identified to species level, this field should hold the genus name and species epithet with a space between (for a total of 2 words). If subspecific epithet is known, this should be included in the string (for a total of 3 words). If the identification was only to a higher rank than genus, then name of the lowest known rank should be entered (1 word)
Do not include the authority for the name here

Scientific Name Author

The year of original publication should be included if known, separated from the author name by a comma and space. If the name has undergone a genus revision, the authority and year should be in parentheses. Valid Examples:

Smith
Jones, 1973
(Hastings, 1986)

Basis of Record

The OBIS databases hold information on the locations where different species have been found. The act of finding a species at a place is called a "collection" or an "observation" throughout the schema documentation. This term is meant to apply very broadly, and includes cases where species were literally seen during a visual search, were collected in a sample of any kind (research survey, fisheries catch date, etc.), where a specimen in a museum indicates the location where it is from, etc.

Use of the Start/End fields

There are several fields, such as latitude, longitude, day collected, month collected, etc., that have a start and end version. For example, the OBIS schema has "latitude", "Start_Latitude" and "End_latitude." How to fill in these fields is perhaps the most confusing part of the OBIS schema.

Why are all these fields there? They seem redundant?

Yes, they are redundant, but there is a reason for that. The Darwin Core represents all the location and time fields as single fields. But OBIS members thought it was important to be able to express a range of location or time. For example, a trawl might have been taken over a line transect that is better expressed as a start and end latitude and longitude than as a single point. Or an old specimen might only be labeled with the dates of the cruise, and not the day it was sampled, so that all we know is that the sample was taken sometime within a span of several months. For these reasons, OBIS added the start and end fields to the location and time information.

However, the OBIS schema needs to be compliant with the Darwin Core, so we still need to keep the original single-field options. So we end up with a field for, e.g, latitude, one for Start Latitude and one for End Latitude.

Implementing the Start/End Fields

How you implement the Start/End fields will depend on the kind of data that you have. But regardless of your data structure, you should never have to type the same value into more than one field - you can make the database do this automatically.

Throughout the following directions, we will use latitude as an example. But the same rationale applies to all of the Start/End fields: Year Collected, Month Collected, Day Collected, Time of Day, and Longitude.

Case 1: all of your latitudes are point latitudes; none of them have separate start end latitudes.

In this case, you should have a "Latitude" field in your database into which you enter this information. When you install DiGIR and map your fields to the OBIS Schema, your "Latitude" field will get mapped to the OBIS Schema fields for "Latitude", "Start Latitude" and "End Latitude."

Case 2: You have samples that were taken over space and want to record a start and end latitude for all of them.

You should have "Start Latitude" and "End Latitude" fields in your database. These map to the same fields in the OBIS Schema. You then have a decision. "Latitude" is a required field in the OBIS schema, and a Darwin Core field, so you must map it to something in your database.

Solution A: The best option is to make an OBIS view of your database and create a "Latitude" field that is the average of your "Start Latitude" and your "End Latitude" fields (i.e. sum the fields and divide by 2)

Solution B: If the space covered by the sample is relatively small, you may feel that just using the "Start Latitude" field is good enough.

In either case, though, you must take care that the location precision is accurate (see below).

Case 3: Some of your samples were taken at a point and some were taken over a distance.

In this case, you can use the same method as Case 2 above. For those samples that were taken as a point, the simples approach is to have the "Start Latitude" and "End Latitude" fields be equal. You can fill them in individually by copying and pasting, or by a small script/routing. Alternatively, you can leave the "End Latitude" blank, but remember you'll have to have a way to get out the appropriate precision fields later (see below).

Filling in the Coordinate Precision fields - general comments

This field or fields (see below) indicates the precision with which the latitude/longitude location is given. This is generally a function of the method used (GPS, etc.). While this is not a required field, it is a very important one and we highly recommend that you include it if at all possible. Note that the unit is meters, while the latitude and longitude fields are reported in decimal degrees. Note that when in doubt it is always better to err on the side of indicating a larger value in this field - it is better to indicate a little too much uncertainty than to report false precision. When in doubt, the number of significant digits in the latitude and longitude may roughly indicate the precision. The precision should never be smaller than the uncertainty created by the number of significant figures in the latitude and longitude (i.e. it doesn't make sense to report that a location is precise to 1 m if the latitude and longitude are only given to the tenth of a degree).

Coordinate Precision versus Start/End Coordinate Precision

The OBIS schema has two location precision fields: "Coordinate Precision" and "Start/End Coordinate Precision." Following the case examples from the "Use of the Start/End fields" notes, this is how they should be filled out.

Case 1: All of your latitudes are point latitudes; none of them have separate start end latitudes. You should have one precision field in your database and use this to estimate the precision with which each sample is measured - this will be dependent on the method used (GPS, etc.). When you map to the OBIS Schema, this field will be mapped to both the "Coordinate Precision" and the "Start/End Coordinate Precision" fields.

Case 2: You have samples that were taken over space and want to record a start and end latitude for all of them. You should have two precision fields in your database. "Start/End Coordinate Precision" should refer to the precision with which the start and end location points are known. "Coordinate Precision" should be a value that is large enough to span the Start and End points from the "Latitude" and "Longitude" fields. An example: say you are recording a 1 km-long trawl and used a GPS to get your start and end points so that you think your lat/lon measurement error is about 10 meters. In this case, your "Start/End Coordinate Precision" is 10. Your "Coordinate precision" will depend on whether you use solution A or solution B above. If you use solution A and report the midpoint of the line for "Latitude" and "Longitude," then the "Coordinate Precision" is 500m. If you use Solution B and report the "Start Latitude" and "Start Longitude" in the "Latitude" and "Longitude" fields, then the "Coordinate Precision" is 1000m.

Minimum and Maximum Elevation versus Depth

Minimum and maximum elevation are included because they are part of the Darwin Core, but for samples below sea level it is synonymous with Depth (except with the opposite sign). OBIS does not query on the elevation fields - it only uses the depth fields.

If all of your data are marine, then you can use just depth in your database. If you want to serve elevation then it can be automatically calculated as -depth. Or vice-versa. Just don't enter the numbers twice!

If you do hold non-marine data, such as data from lakes, then you may need to fill in both fields. In this case, the depth indicates the distance below the water level, while the elevation indicates the height above sea level. So a sample taken 10 meters below the surface of a lake on the top of a mountain that is 3000m high would have a depth = 10 and an elevation = 2990.

Elevation should not be used to indicate height above seafloor for marine sample.

Depth Range

The preferred method is to use the "Minimum Depth" and "Maximum Depth" fields, with both fields being equal when a collection was made at a single depth point, and not to use the Depth Range field. All new data entry projects should follow this format. However, we recognize that there are some legacy databases that have a single depth range field and where the data contributors can't take the time to individually split them up. For those of you with fields that look like "from one to 10 fathoms" and don't have the time to convert them one by one, you can use the "Depth Range" field for free text information on depth. Note that there should be no cases in which all three are filled out for an individual record: if you have the Minimum and Maximum, then the range can be calculated and it should not be entered.

Individual Count versus Observed Individual Count

The Darwin Core developed from the museum community, so "Individual Count" refers to the number of specimens that were saved, not the number of individuals that were caught. OBIS has added the "Observed Individual Count" to indicate the total number per species that were caught. So if a fisheries survey caught 100 squid of a certain species and preserved 10 for a museum collection, then Individual count = 10 and Observed Individual Count = 100. Most databases will only have one or the other of these pieces of information saved.

Related Catalog Item

The Relationship Type and Related Catalog Item can be used to express tagging data following an individual through time (i.e. a later sighting is related to an earlier sighting). A special "relationship type" term should be defined for this.

Gear type

Though Gear type is a very important piece of information, there is no specific field in the OBIS Schema to publish this. Gear type information can be included in Notes

OBIS is a project of:
IOC-UNESCO
IODE Sponsored by:
Martin International and Les Grands Explorateurs
With in-kind support from:
Marine Geospatial Ecology Lab, Duke University
Universidad Simón Bolívar Flanders Marine Institute

OBIS strives to document the ocean's diversity, distribution and abundance of life. Created by the Census of Marine Life, OBIS is now part of the Intergovernmental Oceanographic Commission (IOC) of UNESCO, under its International Oceanographic Data and Information Exchange (IODE) programme.