API SQL Downloads – Functions

This is an experimental feature, and the implementation may change throughout 2025.

These custom functions are available when using the SQL download API.

Note the function names are not case-sensitive, gbif_geodistance, GBIF_GEODISTANCE and GBIF_GeoDistance are all equivalent.

Standard functions

Many of the basic functions supported by Apache Hive are supported.

See Hive Operators and User-Defined Functions in the Apache Hive manual.

Filtering functions

These are the same functions used for predicate-based downloads.

Geospatial distance, GBIF_GeoDistance

Function definition:

BOOLEAN GBIF_GeoDistance(DOUBLE centroidLatitude,
                         DOUBLE centroidLongitude,
                         STRING distance,
                         DOUBLE latitude,
                         DOUBLE longitude)

Parameters:

centroidLatitude

The latitude of the centroid, in WGS84 decimal degrees

centroidLongitude

The longitude of the centroid, in WGS84 decimal degrees

distance

A distance in metres or kilometres including the unit, e.g. 500m or 2.5km

latitude

The latitude of the point to filter (usually the decimalLatitude field)

longitude

The longitude of the point to filter (usually the decimalLongitude field)

Return value:

This function returns TRUE if the point is within distance of the centroid. It is equivalent to the geoDistance predicate.

Example query:

SELECT kingdom, scientificName, COUNT(*)
FROM occurrence
WHERE GBIF_GeoDistance(56.0, 12.0, '10km', decimalLatitude, decimalLongitude) = TRUE
GROUP BY kingdom, scientificName;

Within polygon, GBIF_Within

Function definition:

BOOLEAN GBIF_Within(STRING geometryWellKnownText,
                    DOUBLE latitude,
                    DOUBLE longitude)

Parameters:

geometryWellKnownText

A polygon specified in Well-known text (WKT) format

latitude

The latitude of the point to filter (usually the decimalLatitude field)

longitude

The longitude of the point to filter (usually the decimalLongitude field)

Return value:

This function returns TRUE if the point is within the polygon. It is equivalent to the within predicate, and to the geometry parameter of the occurrence search API.

Notes:

If you have problems, first check your polygon with WKT Map or Wicket — ensure the points are ordered anti-clockwise, and are in longitude-latitude order, and form a closed loop with the first and last points equal. (A polygon with clockwise points represents the opposite: the whole world except the polygon.)

Example query:

SELECT kingdom, scientificName, COUNT(*)
FROM occurrence
WHERE GBIF_Within('POLYGON ((-64.8 32.3, -65.5 18.3, -80.3 25.2, -64.8 32.3))', decimalLatitude, decimalLongitude) = TRUE
GROUP BY kingdom, scientificName;

String Array Contains, GBIF_StringArrayContains

Function definition:

STRING GBIF_StringArrayContains(ARRAY array,
                                STRING query,
                                BOOLEAN matchCase)

Parameters:

array

A string array

query

A string to test against each element of the array

matchCase

Whether to match letter case

Return value:

This function returns TRUE if one or more elements in the array match the query string.

Notes:

This is similar to the standard ARRAY_CONTAINS function, except for the addition of the matchCase parameter.

Example query:

SELECT kingdom, scientificName, COUNT(*)
FROM occurrence
WHERE GBIF_StringArrayContains(occurrence.recordedby, 'Matthew', FALSE)
GROUP BY kingdom, scientificName;

String Array Like, GBIF_StringArrayLike

Function definition:

STRING GBIF_StringArrayLike(ARRAY array,
                            STRING query,
                            BOOLEAN matchCase)

Parameters:

array

A string array

query

A pattern to test against each element of the array, with ? matching a single character and * matching zero or more characters.

matchCase

Whether to match letter case

Return value:

This function returns TRUE if one or more elements in the array match the query pattern.

Example query:

SELECT kingdom, scientificName, COUNT(*)
FROM occurrence
WHERE GBIF_StringArrayLike(occurrence.recordedby, 'Mat*hew', FALSE)
GROUP BY kingdom, scientificName;

Grid functions

These functions are usually used for generating species occurrence cubes. They calculate a grid cell for a point.

EEA Reference Grid cell code, GBIF_EEARGCode

eearg example italy
Figure 1. EEA Reference Grid with 50km resolution

Function definition:

STRING GBIF_EEARGCode(INTEGER gridSize,
                      DOUBLE latitude,
                      DOUBLE longitude,
                      DOUBLE coordinateUncertaintyInMeters)

Parameters:

gridSize

The grid size in metres of the EEA Reference Grid. Must be 25, 100, 250, 1000, 10000, 50000 or 100000.

latitude

The latitude of the point to grid (usually the decimalLatitude field)

longitude

The longitude of the point to grid (usually the decimalLongitude field)

coordinateUncertaintyInMeters

The uncertainty radius to apply to the point (usually used with a default value, for example COALESCE(coordinateUncertaintyInMeters, 1000)). Set to 0 to disable randomization.

The coordinateUncertaintyInMeters value is used to randomize the point within the circle.

Return value:

The code for the cell of the EEA reference grid in which the randomized point falls.

Supported grid resolutions and example values:

Result

gridSize

latitude

longitude

coordinateUncertaintyInMeters

100kmE43N32

100 000

52.0

10.0

1000.0

50kmE425N265

50 000

52.0

10.0

1000.0

10kmE432N321

10 000

52.0

10.0

1000.0

1kmE4321N3210

1 000

52.0

10.0

1000.0

250mE432100N321000

250

52.0

10.0

1000.0

100mE43210N32100

100

52.0

10.0

1000.0

25mE4321000N3210000

25

52.0

10.0

1000.0

Example query:

SELECT
  kingdom,
  scientificName,
  COUNT(*),
  GBIF_EEARGCode(
    10000, (1)
    decimalLatitude,
    decimalLongitude,
    COALESCE(coordinateUncertaintyInMeters, 1000) (2)
  )
FROM occurrence
WHERE country = 'PL'
GROUP BY kingdom, scientificName;
1 Selection of grid size
2 Default to randomization within a 1000m radius if the occurrence does not have a published coordinate uncertainty.

Additional resources

GeoPackage and ShapeFile downloads of the grids at several resolutions are available for download at the European Environment Agency Datahub. (See the 2013 and 2011 datasets for alternative resolutions.)

Extended Quarter-Degree Grid cell code, GBIF_EQDGCCode

eqdgc example madagascar
Figure 2. Extended Quarter-Degree Grid cells at level 2 on northern Madagascar

Function definition:

STRING GBIF_EQDGCCode(INTEGER level,
                      DOUBLE latitude,
                      DOUBLE longitude,
                      DOUBLE coordinateUncertaintyInMeters)

Parameters:

level

The level of the grid; the number of additional divisions applied to a one-degree cell.

latitude

The latitude of the point to grid (usually the decimalLatitude field)

longitude

The longitude of the point to grid (usually the decimalLongitude field)

coordinateUncertaintyInMeters

The uncertainty radius to apply to the point (usually used with a default value, for example COALESCE(coordinateUncertaintyInMeters, 1000)). Set to 0 to disable randomization.

The coordinateUncertaintyInMeters value is used to randomize the point within the circle.

A level of 0 will give the 1° cell, e.g. S01E010. For quarter-degrees, use level 2, e.g. S01E010AD.

Return value:

The code for the cell of the Extended Quarter-Degree Grid in which the randomized point falls.

Supported grid resolutions and example values:

Result

level

latitude

longitude

coordinateUncertaintyInMeters

E010N52

0

52.3

10.3

1000.0

E010N52C

1

52.3

10.3

1000.0

E010N52CB

2

52.3

10.3

1000.0

E010N52CBC

3

52.3

10.3

1000.0

E010N52CBCC

4

52.3

10.3

1000.0

E010N52CBCCB

5

52.3

10.3

1000.0

E010N52CBCCBB

6

52.3

10.3

1000.0

Example query:

SELECT
  kingdom,
  scientificName,
  COUNT(*),
  GBIF_EQDGCCode(
    2, (1)
    decimalLatitude,
    decimalLongitude,
    COALESCE(coordinateUncertaintyInMeters, 1000) (2)
  )
FROM occurrence
WHERE country = 'ZA'
GROUP BY kingdom, scientificName;
1 Selection of grid level
2 Default to randomization within a 1000m radius if the occurrence does not have a published coordinate uncertainty.

Additional resources

See also: Wikipedia: QDGC.

Degree-Minute-Second Grid cell code, GBIF_DMSCCode

Function definition:

STRING GBIF_DMSGCCode(INTEGER resolution,
                      DOUBLE latitude,
                      DOUBLE longitude,
                      DOUBLE coordinateUncertaintyInMeters)

Parameters:

resolution

The resolution of the grid in seconds

latitude

The latitude of the point to grid (usually the decimalLatitude field)

longitude

The longitude of the point to grid (usually the decimalLongitude field)

coordinateUncertaintyInMeters

The uncertainty radius to apply to the point (usually used with a default value, for example COALESCE(coordinateUncertaintyInMeters, 1000)). Set to 0 to disable randomization.

The coordinateUncertaintyInMeters value is used to randomize the point within the circle.

A resolution of 3600 will give the 1° cell, e.g. S01E010 — this is the same as Extended Quarter-Degree Grid cell code, GBIF_EQDGCCode with level 0.

The resolution must be a divisor of 3600. Useful values are 3600 (1°), 900 (15′), 600 (10′), 300 (5′), 150 (2′30″), 60 (1′) and 30 (30″).

Return value:

A code for the cell in the format EXXXxxxxNYYyyyy where E is either E or W and N is either N or S. XXX is a value between 0 and 179, YY is between 0 and 89. The fractional parts xxxx and yyyy, if present, are the coordinate of the corner of the square nearest to the origin. The point 0°, 0° is defined to be E000N00.

Example grid resolutions and example values:

10° 48′ 45.6″ N = 10.8127

Result

level

latitude

longitude

coordinateUncertaintyInMeters

E010°N52° (E010°N52°)

3600 (1°)

52.3

10.8127

1000.0

E010°45′N52°15′(E010°45′N52°15′)

900 (15′)

52.3

10.8127

1000.0

E010°40′N52°10′(E010°40′N52°10′)

600 (10′)

52.3

10.8127

1000.0

E010°45′N52°15′(E010°45′N52°15′)

300 (5′)

52.3

10.8127

1000.0

E010°47′30″N52°17′30″(E010°47′30″N52°17′30″)

150 (2½′)

52.3

10.8127

1000.0

E010°48′N52°18′(E010°48′N52°18′)

60 (1′)

52.3

10.8127

1000.0

E010°48′30″N52°18′00″(E010°48′30″N52°18′00″)

30

52.3

10.8127

1000.0

E010°48′45″N52°18′00″(E010°48′45″N52°18′00″)

1

52.3

10.8127

1000.0

Example query:

SELECT
  kingdom,
  scientificName,
  COUNT(*),
  GBIF_DMSGCCode(
    900, (1)
    decimalLatitude,
    decimalLongitude,
    COALESCE(coordinateUncertaintyInMeters, 1000) (2)
  )
FROM occurrence
WHERE country = 'ZA'
GROUP BY kingdom, scientificName;
1 Selection of grid resolution
2 Default to randomization within a 1000m radius if the occurrence does not have a published coordinate uncertainty.

ISEA3H Grid cell code GBIF_ISEA3HCode

This function implements the Inverse Snyder Equal-Area Projection (ISEA) Aperture 3 Hexagonal (3H) Discrete Global Grid System (DGGS), ISEA3H. Grid cell codes/identifiers are as specified in A novel identifier scheme for the ISEA Aperture 3 Hexagon Discrete Global Grid System.

Function definition:

STRING GBIF_ISEA3HCode(INTEGER resolution,
                       DOUBLE latitude,
                       DOUBLE longitude,
                       DOUBLE coordinateUncertaintyInMeters)

Parameters:

resolution

The resolution of the grid; the number of division steps applied to the initial dodecahedron. Valid values are 1 to 22.

latitude

The latitude of the point to grid (usually the decimalLatitude field)

longitude

The longitude of the point to grid (usually the decimalLongitude field)

coordinateUncertaintyInMeters

The uncertainty radius to apply to the point (usually used with a default value, for example COALESCE(coordinateUncertaintyInMeters, 1000)). Set to 0 to disable randomization.

The coordinateUncertaintyInMeters value is used to randomize the point within the circle.

See ISEA3H details for the number and size of cells at each resolution.

Return value:

The code for the cell of the ISEA3H grid cell in which the randomized point falls.

Supported grid resolutions and example values:

Result

resolution

latitude

longitude

coordinateUncertaintyInMeters

-358282526011250000

3

52.3

10.3

1000.0

652180731009071912

6

52.3

10.3

1000.0

952458899010519815

9

52.3

10.3

1000.0

Example query:

SELECT
  kingdom,
  scientificName,
  COUNT(*),
  GBIF_ISEA3HCode(
    6, (1)
    decimalLatitude,
    decimalLongitude,
    COALESCE(coordinateUncertaintyInMeters, 1000) (2)
  )
FROM occurrence
WHERE continent = 'AFRICA'
GROUP BY kingdom, scientificName;
1 Selection of grid level
2 Default to randomization within a 1000m radius if the occurrence does not have a published coordinate uncertainty.

Additional resources

References:

See also:

  • geogrid, the Java library used by GBIF to calculate the grid.

  • dggridR, an R package to calculate the grid (though not using the same identifiers).

Military Grid Reference System cell code, GBIF_MGRSCode

Function definition:

STRING GBIF_MGRSCode(INTEGER gridSize,
                     DOUBLE latitude,
                     DOUBLE longitude,
                     DOUBLE coordinateUncertaintyInMeters)

Parameters:

gridSize

The grid size in metres. Must be 1, 10, 100, 1000, 10000, 100000 or 0.

latitude

The latitude of the point to grid (usually the decimalLatitude field)

longitude

The longitude of the point to grid (usually the decimalLongitude field)

coordinateUncertaintyInMeters

The uncertainty radius to apply to the point (usually used with a default value, for example COALESCE(coordinateUncertaintyInMeters, 1000)). Set to 0 to disable randomization.

The coordinateUncertaintyInMeters value is used to randomize the point within the circle.

A gridSize of 0 will give the Grid Zone Junction (GZJ) only, e.g. 32. Other values increase the accuracy of the grid, e.g. 100 (metres) 32UNC686615.

Return value:

The code for the cell of the Military Grid Reference System in which the randomized point falls.

Supported grid resolutions and example values:

Result

gridSize

latitude

longitude

coordinateUncertaintyInMeters

32U

0

52.0

10.0

1000.0

32UNC

100 000

52.0

10.0

1000.0

32UNC66

10 000

52.0

10.0

1000.0

32UNC6861

1 000

52.0

10.0

1000.0

32UNC686615

100

52.0

10.0

1000.0

32UNC68646151

10

52.0

10.0

1000.0

32UNC6864961510

1

52.0

10.0

1000.0

Example query:

SELECT
  kingdom,
  scientificName,
  COUNT(*),
  GBIF_MGRSCode(
    10000, (1)
    decimalLatitude,
    decimalLongitude,
    COALESCE(coordinateUncertaintyInMeters, 1000) (2)
  )
FROM occurrence
WHERE country = 'CG'
GROUP BY kingdom, scientificName;
1 Selection of grid size
2 Default to randomization within a 1000m radius if the occurrence does not have a published coordinate uncertainty.

Additional resources

Text output functions

These functions are useful when producing text-type output, e.g. tab-delimited files.

Join Array, GBIF_JoinArray

Function definition:

STRING GBIF_JoinArray(ARRAY array,
                      STRING separator)

Parameters:

array

An array

separator

A separator to put between the array values, e.g. | or `, `

Return value:

This function returns the array elements separated by the separator.

Temporal Uncertainty, GBIF_TemporalUncertainty

Function definition:

STRING GBIF_TemporalUncertainty(String dateInterval,
                                String time)

Parameters:

dateInterval

A string containing an ISO-8106 date or date interval.

time

A string containing an ISO-8106 time, or null if not required.

Return value:

This function returns the uncertainty in seconds of the date.

Example arguments and returned value

dateInterval

Result

Remarks

2021-03-21T15:01:32.456Z

1

Milliseconds are rounded to seconds.

2021-03-21T15:01:32Z

1

2021-03-21T15:01Z

60

2021-03-21T15Z

60×60

2021-03-21

60×60×24

2021-03-01

60×60×24

2021-01-01

60×60×24

2021-03

60×60×24×31

2021

60×60×24×365

2021-03-21/2021-03-23

60×60×24×3

To ISO8601 Date, GBIF_ToISO8601

Function definition:

STRING GBIF_ToISO8601(TIMESTAMP date)

Parameters:

date

A timestamp of millseconds from the Unix epoch.

Return value:

This function formats a timestamp to a string like 2024-01-26T13:43:08Z. The UTC timezone (Z) is used.

To Local ISO8601 Date, GBIF_ToLocalISO8601

Function definition:

STRING GBIF_ToLocalISO8601(TIMESTAMP date)

Parameters:

date

A timestamp of millseconds from the Unix epoch.

Return value:

This function formats a timestamp to a string like 2024-01-26T13:43:08. No timezone is included.