API SQL Downloads – Functions

This is an experimental feature, and the implementation may change throughout 2024. The feature is currently only available for preview by invited users. Contact helpdesk@gbif.org to request access.

These custom functions are available when using the SQL download API.

Note the function names are not case-sensitive, gbif_geodistance, GBIF_GEODISTANCE and GBIF_GeoDistance are all equivalent.

Standard functions

Many of the basic functions supported by Apache Hive are supported.

See Hive Operators and User-Defined Functions in the Apache Hive manual.

Filtering functions

These are the same functions used for predicate-based downloads.

Geospatial distance, GBIF_GeoDistance

Function definition:

BOOLEAN GBIF_GeoDistance(DOUBLE centroidLatitude,
                         DOUBLE centroidLongitude,
                         STRING distance,
                         DOUBLE latitude,
                         DOUBLE longitude)

Parameters:

centroidLatitude

The latitude of the centroid, in WGS84 decimal degrees

centroidLongitude

The longitude of the centroid, in WGS84 decimal degrees

distance

A distance in metres or kilometres including the unit, e.g. 500m or 2.5km

latitude

The latitude of the point to filter (usually the decimalLatitude field)

longitude

The longitude of the point to filter (usually the decimalLongitude field)

Return value:

This function returns TRUE if the point is within distance of the centroid. It is equivalent to the geoDistance predicate.

Within polygon, GBIF_Within

Function definition:

BOOLEAN GBIF_Within(STRING geometryWellKnownText,
                    DOUBLE latitude,
                    DOUBLE longitude)

Parameters:

geometryWellKnownText

A polygon specified in Well-known text (WKT) format

latitude

The latitude of the point to filter (usually the decimalLatitude field)

longitude

The longitude of the point to filter (usually the decimalLongitude field)

Return value:

This function returns TRUE if the point is within the polygon. It is equivalent to the within predicate, and to the geometry parameter of the occurrence search API.

Notes:

If you have problems, first check your polygon with Wicket — ensure the points are ordered anti-clockwise. (A polygon with clockwise points represents the opposite: the whole world except the polygon.)

Grid functions

These functions are usually used for generating species occurrence cubes. They calculate a grid cell for a point.

EEA Reference Grid cell code, GBIF_EEARGCode

Function definition:

STRING GBIF_EEARGCode(INTEGER gridSize,
                      DOUBLE latitude,
                      DOUBLE longitude,
                      DOUBLE coordinateUncertaintyInMeters)

Parameters:

gridSize

The grid size in metres of the EEA Reference Grid. Must be 25, 100, 250, 1000, 10000 or 100000.

latitude

The latitude of the point to grid (usually the decimalLatitude field)

longitude

The longitude of the point to grid (usually the decimalLongitude field)

coordinateUncertaintyInMeters

The uncertainty radius to apply to the point (usually used with a default value, for example COALESCE(coordinateUncertaintyInMeters, 1000))

The coordinateUncertaintyInMeters value is used to randomize the point within the circle.

Return value:

The code for the cell of the EEA reference grid in which the randomized point falls.

Example arguments and returned values:

Result

gridSize

latitude

longitude

coordinateUncertaintyInMeters

100kmE43N32

100 000

52.0

10.0

1000.0

10kmE432N321

10 000

52.0

10.0

1000.0

1kmE4321N3210

1 000

52.0

10.0

1000.0

250mE432100N321000

250

52.0

10.0

1000.0

100mE43210N32100

100

52.0

10.0

1000.0

25mE4321000N3210000

25

52.0

10.0

1000.0

Extended Quarter-Degree Grid cell code, GBIF_EQDGCCode

Function definition:

STRING GBIF_EQDGCCode(INTEGER level,
                      DOUBLE latitude,
                      DOUBLE longitude,
                      DOUBLE coordinateUncertaintyInMeters)

Parameters:

level

The level of the grid; the number of additional divisions applied to a one-degree cell.

latitude

The latitude of the point to grid (usually the decimalLatitude field)

longitude

The longitude of the point to grid (usually the decimalLongitude field)

coordinateUncertaintyInMeters

The uncertainty radius to apply to the point (usually used with a default value, for example COALESCE(coordinateUncertaintyInMeters, 1000))

The coordinateUncertaintyInMeters value is used to randomize the point within the circle.

A level of 0 will give the 1° cell, e.g. S01E010. For quarter-degrees, use level 2, e.g. S01E010AD.

Return value:

The code for the cell of the Extended Quarter-Degree Grid in which the randomized point falls.

Example arguments and returned values:

Result

level

latitude

longitude

coordinateUncertaintyInMeters

E010N52

0

52.3

10.3

1000.0

E010N52C

1

52.3

10.3

1000.0

E010N52CB

2

52.3

10.3

1000.0

E010N52CBC

3

52.3

10.3

1000.0

E010N52CBCC

4

52.3

10.3

1000.0

E010N52CBCCB

5

52.3

10.3

1000.0

E010N52CBCCBB

6

52.3

10.3

1000.0

See also: Wikipedia: QDGC.

Military Grid Reference System cell code, GBIF_MGRSCode

Function definition:

STRING GBIF_MGRSCode(INTEGER gridSize,
                     DOUBLE latitude,
                     DOUBLE longitude,
                     DOUBLE coordinateUncertaintyInMeters)

Parameters:

gridSize

The grid size in metres. Must be 1, 10, 100, 1000, 10000, 100000 or 0.

latitude

The latitude of the point to grid (usually the decimalLatitude field)

longitude

The longitude of the point to grid (usually the decimalLongitude field)

coordinateUncertaintyInMeters

The uncertainty radius to apply to the point (usually used with a default value, for example COALESCE(coordinateUncertaintyInMeters, 1000))

The coordinateUncertaintyInMeters value is used to randomize the point within the circle.

A gridSize of 0 will give the Grid Zone Junction (GZJ) only, e.g. 32. Other values increase the accuracy of the grid, e.g. 100 (metres) 32UNC686615.

Return value:

The code for the cell of the Military Grid Reference System in which the randomized point falls.

Example arguments and returned values:

Result

gridSize

latitude

longitude

coordinateUncertaintyInMeters

32U

0

52.0

10.0

1000.0

32UNC

100 000

52.0

10.0

1000.0

32UNC66

10 000

52.0

10.0

1000.0

32UNC6861

1 000

52.0

10.0

1000.0

32UNC686615

100

52.0

10.0

1000.0

32UNC68646151

10

52.0

10.0

1000.0

32UNC6864961510

1

52.0

10.0

1000.0

Text output functions

These functions are useful when producing text-type output, e.g. tab-delimited files.

Join Array, GBIF_JoinArray

Function definition:

STRING GBIF_JoinArray(ARRAY array,
                      STRING separator)

Parameters:

array

An array

separator

A separator to put between the array values, e.g. | or `, `

Return value:

This function returns the array elements separated by the separator.

Temporal Uncertainty, GBIF_TemporalUncertainty

Function definition:

STRING GBIF_TemporalUncertainty(String dateInterval)

Parameters:

dateInterval

A string containing an ISO-8106 date or date interval.

Return value:

This function returns the uncertainty in seconds of the date.

Example arguments and returned value

dateInterval

Result

Remarks

2021-03-21T15:01:32.456Z

1

Milliseconds are rounded to seconds.

2021-03-21T15:01:32Z

1

2021-03-21T15:01Z

60

2021-03-21T15Z

60×60

2021-03-21

60×60×24

2021-03-01

60×60×24

2021-01-01

60×60×24

2021-03

60×60×24×31

2021

60×60×24×365

2021-03-21/2021-03-23

60×60×24×3

To ISO8601 Date, GBIF_ToISO8601

Function definition:

STRING GBIF_ToISO8601(TIMESTAMP date)

Parameters:

date

A timestamp of millseconds from the Unix epoch.

Return value:

This function formats a timestamp to a string like 2024-01-26T13:43:08Z. The UTC timezone (Z) is used.

To Local ISO8601 Date, GBIF_ToLocalISO8601

Function definition:

STRING GBIF_ToLocalISO8601(TIMESTAMP date)

Parameters:

date

A timestamp of millseconds from the Unix epoch.

Return value:

This function formats a timestamp to a string like 2024-01-26T13:43:08. No timezone is included.