Dataset classes

Datasets shared with GBIF can be published in four different formats:

Dataset type Content

Metadata-only

datasets describing unavailable resources like undigitized natural history collections or data that are not yet standardized

Checklist

a catalogue or list of named organisms, or taxa

Occurrence

the evidence of the occurrence of a species (or other taxon) at a particular place on a specified date

Sampling event

offering evidence that a species occurred at a given location and date, but also making it possible to assess community composition for broader taxonomic groups or even the abundance of species at multiple times and places

The four dataset classes supported by GBIF start simple and become progressively richer, more structured and more complex. This blog post can help you choose a dataset class if you are new to GBIF publishing.

Some fields are strongly recommended to be included in the different dataset classes and some may be required if the dataset is funded through a programme operated by GBIF (e.g. BID, BIFA, CESP).

Data quality requirements for each dataset class and the GBIF-operated programmes are documented here. GBIF provides access to several tools to assess data quality before publication.

Metadata-only

At its simplest level, institutions can create datasets describing unavailable resources like undigitized natural history collections. The other dataset classes include this basic information, but this ‘metadata-only’ class offers users a valuable tool for discovering and learning about evidence not yet available online or could be accessed in other places than GBIF. Metadata-only datasets can also help assess the relative importance and value of undigitized collections and set priorities for future digitization. As with all datasets, GBIF ensures that each metadata dataset is associated with a unique Digital Object Identifier (DOI) to streamline data users’ citation of these resources.

Browse metadata-only datasets

Checklist

Checklists are catalogues or lists of named organisms, or taxa. While they may include additional details like local species names or specimen citations, these ‘checklists’ typically categorize information along taxonomic, geographic, and thematic lines or some combination of the three. For example, a dataset that catalogues the Red Listed molluscs of Seychelles has distinct elements of taxonomy (the phylum Mollusca), geography (the island nation of Seychelles) and theme (species deemed imperilled by IUCN experts). Checklists function as a rapid summary or baseline inventory of taxa in a given context.

Darwin Core Taxon Browse checklists Darwin Core Archive template for checklists Data quality requirements and recommendations for checklists

Occurrence

Occurrence datasets have sufficiently consistent detail to contribute information about the location of individual organisms in time and space — that is, they offer evidence of the occurrence of a species (or other taxon) at a particular place on a specified date. Occurrence datasets are the core of data published through GBIF.org, and examples can range from specimens and fossils in natural history collections, observations by field researchers and citizen scientists, and data gathered from camera traps or remote-sensing satellites.

Occurrence records in these datasets sometimes provide only general locality information, sometimes simply identifying the country, but in many cases, more precise locations and geographic coordinates support fine-scale analysis and mapping of species distributions.

Darwin Core Occurrence Browse occurrence datasets Darwin Core Archive template for occurrence datasets Data quality recommendations and requirements for occurrences

Sampling event

Sampling event datasets provide greater detail, not only offering evidence that a species occurred at a given location and date but also making it possible to assess community composition for broader taxonomic groups, or even the abundance of species at multiple times and places. Sampling-event datasets typically derive from standardised protocols for measuring and monitoring biodiversity like vegetation transects, bird censuses, and freshwater or marine sampling.

By indicating the methods, events and relative abundance of species recorded in a sample, these datasets improve comparisons with data collected using the same protocols at different times and places — in some cases, even enabling data users to infer the absence of particular species from particular sites.

Darwin Core Event Browse sampling event datasets Darwin Core Archive template for sampling event datasets Data quality recommendations and requirements for sampling events