API SQL Downloads – Functions
This is an experimental feature, and the implementation may change throughout 2025. |
These custom functions are available when using the SQL download API.
Note the function names are not case-sensitive, gbif_geodistance
, GBIF_GEODISTANCE
and GBIF_GeoDistance
are all equivalent.
Standard functions
Many of the basic functions supported by Apache Hive are supported.
See Hive Operators and User-Defined Functions in the Apache Hive manual.
Filtering functions
These are the same functions used for predicate-based downloads.
Geospatial distance, GBIF_GeoDistance
Function definition:
BOOLEAN GBIF_GeoDistance(DOUBLE centroidLatitude,
DOUBLE centroidLongitude,
STRING distance,
DOUBLE latitude,
DOUBLE longitude)
Parameters:
centroidLatitude
-
The latitude of the centroid, in WGS84 decimal degrees
centroidLongitude
-
The longitude of the centroid, in WGS84 decimal degrees
distance
-
A distance in metres or kilometres including the unit, e.g.
500m
or2.5km
latitude
-
The latitude of the point to filter (usually the
decimalLatitude
field) longitude
-
The longitude of the point to filter (usually the
decimalLongitude
field)
Return value:
This function returns TRUE
if the point is within distance of the centroid. It is equivalent to the geoDistance
predicate.
Example query:
SELECT kingdom, scientificName, COUNT(*)
FROM occurrence
WHERE GBIF_GeoDistance(56.0, 12.0, '10km', decimalLatitude, decimalLongitude) = TRUE
GROUP BY kingdom, scientificName;
Within polygon, GBIF_Within
Function definition:
BOOLEAN GBIF_Within(STRING geometryWellKnownText,
DOUBLE latitude,
DOUBLE longitude)
Parameters:
geometryWellKnownText
-
A polygon specified in Well-known text (WKT) format
latitude
-
The latitude of the point to filter (usually the
decimalLatitude
field) longitude
-
The longitude of the point to filter (usually the
decimalLongitude
field)
Return value:
This function returns TRUE
if the point is within the polygon. It is equivalent to the within
predicate, and to the geometry
parameter of the occurrence search API.
Notes:
If you have problems, first check your polygon with WKT Map or Wicket — ensure the points are ordered anti-clockwise, and are in longitude-latitude order, and form a closed loop with the first and last points equal. (A polygon with clockwise points represents the opposite: the whole world except the polygon.)
Example query:
SELECT kingdom, scientificName, COUNT(*)
FROM occurrence
WHERE GBIF_Within('POLYGON ((-64.8 32.3, -65.5 18.3, -80.3 25.2, -64.8 32.3))', decimalLatitude, decimalLongitude) = TRUE
GROUP BY kingdom, scientificName;
String Array Contains, GBIF_StringArrayContains
Function definition:
STRING GBIF_StringArrayContains(ARRAY array,
STRING query,
BOOLEAN matchCase)
Parameters:
array
-
A string array
query
-
A string to test against each element of the array
matchCase
-
Whether to match letter case
Return value:
This function returns TRUE
if one or more elements in the array match the query string.
Notes:
This is similar to the standard ARRAY_CONTAINS
function, except for the addition of the matchCase
parameter.
Example query:
SELECT kingdom, scientificName, COUNT(*)
FROM occurrence
WHERE GBIF_StringArrayContains(occurrence.recordedby, 'Matthew', FALSE)
GROUP BY kingdom, scientificName;
String Array Like, GBIF_StringArrayLike
Function definition:
STRING GBIF_StringArrayLike(ARRAY array,
STRING query,
BOOLEAN matchCase)
Parameters:
array
-
A string array
query
-
A pattern to test against each element of the array, with
?
matching a single character and*
matching zero or more characters. matchCase
-
Whether to match letter case
Return value:
This function returns TRUE
if one or more elements in the array match the query pattern.
Example query:
SELECT kingdom, scientificName, COUNT(*)
FROM occurrence
WHERE GBIF_StringArrayLike(occurrence.recordedby, 'Mat*hew', FALSE)
GROUP BY kingdom, scientificName;
Grid functions
These functions are usually used for generating species occurrence cubes. They calculate a grid cell for a point.
EEA Reference Grid cell code, GBIF_EEARGCode

Function definition:
STRING GBIF_EEARGCode(INTEGER gridSize,
DOUBLE latitude,
DOUBLE longitude,
DOUBLE coordinateUncertaintyInMeters)
Parameters:
gridSize
-
The grid size in metres of the EEA Reference Grid. Must be 25, 100, 250, 1000, 10000, 50000 or 100000.
latitude
-
The latitude of the point to grid (usually the
decimalLatitude
field) longitude
-
The longitude of the point to grid (usually the
decimalLongitude
field) coordinateUncertaintyInMeters
-
The uncertainty radius to apply to the point (usually used with a default value, for example
COALESCE(coordinateUncertaintyInMeters, 1000)
). Set to 0 to disable randomization.
The coordinateUncertaintyInMeters value is used to randomize the point within the circle.
Return value:
The code for the cell of the EEA reference grid in which the randomized point falls.
Supported grid resolutions and example values:
Result |
gridSize |
latitude |
longitude |
coordinateUncertaintyInMeters |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Example query:
SELECT
kingdom,
scientificName,
COUNT(*),
GBIF_EEARGCode(
10000, (1)
decimalLatitude,
decimalLongitude,
COALESCE(coordinateUncertaintyInMeters, 1000) (2)
)
FROM occurrence
WHERE country = 'PL'
GROUP BY kingdom, scientificName;
1 | Selection of grid size |
2 | Default to randomization within a 1000m radius if the occurrence does not have a published coordinate uncertainty. |
Additional resources
GeoPackage and ShapeFile downloads of the grids at several resolutions are available for download at the European Environment Agency Datahub. (See the 2013 and 2011 datasets for alternative resolutions.)
Reference: About the EEA reference grid (PDF).
Extended Quarter-Degree Grid cell code, GBIF_EQDGCCode

Function definition:
STRING GBIF_EQDGCCode(INTEGER level,
DOUBLE latitude,
DOUBLE longitude,
DOUBLE coordinateUncertaintyInMeters)
Parameters:
level
-
The level of the grid; the number of additional divisions applied to a one-degree cell.
latitude
-
The latitude of the point to grid (usually the
decimalLatitude
field) longitude
-
The longitude of the point to grid (usually the
decimalLongitude
field) coordinateUncertaintyInMeters
-
The uncertainty radius to apply to the point (usually used with a default value, for example
COALESCE(coordinateUncertaintyInMeters, 1000)
). Set to 0 to disable randomization.
The coordinateUncertaintyInMeters value is used to randomize the point within the circle.
A level of 0 will give the 1° cell, e.g. S01E010
. For quarter-degrees, use level 2, e.g. S01E010AD
.
Return value:
The code for the cell of the Extended Quarter-Degree Grid in which the randomized point falls.
Supported grid resolutions and example values:
Result |
level |
latitude |
longitude |
coordinateUncertaintyInMeters |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Example query:
SELECT
kingdom,
scientificName,
COUNT(*),
GBIF_EQDGCCode(
2, (1)
decimalLatitude,
decimalLongitude,
COALESCE(coordinateUncertaintyInMeters, 1000) (2)
)
FROM occurrence
WHERE country = 'ZA'
GROUP BY kingdom, scientificName;
1 | Selection of grid level |
2 | Default to randomization within a 1000m radius if the occurrence does not have a published coordinate uncertainty. |
Additional resources
See also: Wikipedia: QDGC.
Degree-Minute-Second Grid cell code, GBIF_DMSCCode
Function definition:
STRING GBIF_DMSGCCode(INTEGER resolution,
DOUBLE latitude,
DOUBLE longitude,
DOUBLE coordinateUncertaintyInMeters)
Parameters:
resolution
-
The resolution of the grid in seconds
latitude
-
The latitude of the point to grid (usually the
decimalLatitude
field) longitude
-
The longitude of the point to grid (usually the
decimalLongitude
field) coordinateUncertaintyInMeters
-
The uncertainty radius to apply to the point (usually used with a default value, for example
COALESCE(coordinateUncertaintyInMeters, 1000)
). Set to 0 to disable randomization.
The coordinateUncertaintyInMeters value is used to randomize the point within the circle.
A resolution of 3600 will give the 1° cell, e.g. S01E010
— this is the same as Extended Quarter-Degree Grid cell code, GBIF_EQDGCCode
with level 0.
The resolution must be a divisor of 3600. Useful values are 3600 (1°), 900 (15′), 600 (10′), 300 (5′), 150 (2′30″), 60 (1′) and 30 (30″).
Return value:
A code for the cell in the format EXXXxxxxNYYyyyy
where E
is either E
or W
and N
is either N
or S
. XXX
is a value between 0 and 179, YY
is between 0 and 89. The fractional parts xxxx
and yyyy
, if present, are the coordinate of the corner of the square nearest to the origin. The point 0°, 0° is defined to be E000N00
.
Example grid resolutions and example values:
10° 48′ 45.6″ N = 10.8127
Result |
level |
latitude |
longitude |
coordinateUncertaintyInMeters |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Example query:
SELECT
kingdom,
scientificName,
COUNT(*),
GBIF_DMSGCCode(
900, (1)
decimalLatitude,
decimalLongitude,
COALESCE(coordinateUncertaintyInMeters, 1000) (2)
)
FROM occurrence
WHERE country = 'ZA'
GROUP BY kingdom, scientificName;
1 | Selection of grid resolution |
2 | Default to randomization within a 1000m radius if the occurrence does not have a published coordinate uncertainty. |
ISEA3H Grid cell code GBIF_ISEA3HCode
This function implements the Inverse Snyder Equal-Area Projection (ISEA) Aperture 3 Hexagonal (3H) Discrete Global Grid System (DGGS), ISEA3H. Grid cell codes/identifiers are as specified in A novel identifier scheme for the ISEA Aperture 3 Hexagon Discrete Global Grid System.
Function definition:
STRING GBIF_ISEA3HCode(INTEGER resolution,
DOUBLE latitude,
DOUBLE longitude,
DOUBLE coordinateUncertaintyInMeters)
Parameters:
resolution
-
The resolution of the grid; the number of division steps applied to the initial dodecahedron. Valid values are 1 to 22.
latitude
-
The latitude of the point to grid (usually the
decimalLatitude
field) longitude
-
The longitude of the point to grid (usually the
decimalLongitude
field) coordinateUncertaintyInMeters
-
The uncertainty radius to apply to the point (usually used with a default value, for example
COALESCE(coordinateUncertaintyInMeters, 1000)
). Set to 0 to disable randomization.
The coordinateUncertaintyInMeters value is used to randomize the point within the circle.
See ISEA3H details for the number and size of cells at each resolution.
Return value:
The code for the cell of the ISEA3H grid cell in which the randomized point falls.
Supported grid resolutions and example values:
Result |
resolution |
latitude |
longitude |
coordinateUncertaintyInMeters |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Example query:
SELECT
kingdom,
scientificName,
COUNT(*),
GBIF_ISEA3HCode(
6, (1)
decimalLatitude,
decimalLongitude,
COALESCE(coordinateUncertaintyInMeters, 1000) (2)
)
FROM occurrence
WHERE continent = 'AFRICA'
GROUP BY kingdom, scientificName;
1 | Selection of grid level |
2 | Default to randomization within a 1000m radius if the occurrence does not have a published coordinate uncertainty. |
Additional resources
References:
See also:
Military Grid Reference System cell code, GBIF_MGRSCode
Function definition:
STRING GBIF_MGRSCode(INTEGER gridSize,
DOUBLE latitude,
DOUBLE longitude,
DOUBLE coordinateUncertaintyInMeters)
Parameters:
gridSize
-
The grid size in metres. Must be 1, 10, 100, 1000, 10000, 100000 or 0.
latitude
-
The latitude of the point to grid (usually the
decimalLatitude
field) longitude
-
The longitude of the point to grid (usually the
decimalLongitude
field) coordinateUncertaintyInMeters
-
The uncertainty radius to apply to the point (usually used with a default value, for example
COALESCE(coordinateUncertaintyInMeters, 1000)
). Set to 0 to disable randomization.
The coordinateUncertaintyInMeters value is used to randomize the point within the circle.
A gridSize of 0 will give the Grid Zone Junction (GZJ) only, e.g. 32
. Other values increase the accuracy of the grid, e.g. 100 (metres) 32UNC686615
.
Return value:
The code for the cell of the Military Grid Reference System in which the randomized point falls.
Supported grid resolutions and example values:
Result |
gridSize |
latitude |
longitude |
coordinateUncertaintyInMeters |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Example query:
SELECT
kingdom,
scientificName,
COUNT(*),
GBIF_MGRSCode(
10000, (1)
decimalLatitude,
decimalLongitude,
COALESCE(coordinateUncertaintyInMeters, 1000) (2)
)
FROM occurrence
WHERE country = 'CG'
GROUP BY kingdom, scientificName;
1 | Selection of grid size |
2 | Default to randomization within a 1000m radius if the occurrence does not have a published coordinate uncertainty. |
Additional resources
Reference: Grids and Reference Systems.
See also: Wikipedia: Military Grid Reference System.
Text output functions
These functions are useful when producing text-type output, e.g. tab-delimited files.
Join Array, GBIF_JoinArray
Function definition:
STRING GBIF_JoinArray(ARRAY array,
STRING separator)
Parameters:
array
-
An array
separator
-
A separator to put between the array values, e.g.
|
or `, `
Return value:
This function returns the array elements separated by the separator.
Temporal Uncertainty, GBIF_TemporalUncertainty
Function definition:
STRING GBIF_TemporalUncertainty(String dateInterval,
String time)
Parameters:
dateInterval
-
A string containing an ISO-8106 date or date interval.
time
-
A string containing an ISO-8106 time, or
null
if not required.
Return value:
This function returns the uncertainty in seconds of the date.
Example arguments and returned value
dateInterval |
Result |
Remarks |
2021-03-21T15:01:32.456Z |
1 |
Milliseconds are rounded to seconds. |
2021-03-21T15:01:32Z |
1 |
|
2021-03-21T15:01Z |
60 |
|
2021-03-21T15Z |
60×60 |
|
2021-03-21 |
60×60×24 |
|
2021-03-01 |
60×60×24 |
|
2021-01-01 |
60×60×24 |
|
2021-03 |
60×60×24×31 |
|
2021 |
60×60×24×365 |
|
2021-03-21/2021-03-23 |
60×60×24×3 |