Description
Search on a single species name, or many. And search across a singleor many data sources.
Usage
occ( query = NULL, from = "gbif", limit = 500, start = NULL, page = NULL, geometry = NULL, has_coords = NULL, ids = NULL, date = NULL, callopts = list(), gbifopts = list(), inatopts = list(), ebirdopts = list(), vertnetopts = list(), idigbioopts = list(), obisopts = list(), alaopts = list(), throw_warnings = TRUE)
Value
an object of class occdat
, with a print method to give a briefsummary. The print method only shows results for those that have someresults (those with no results are not shown). The occdat
class is justa thin wrapper around a named list, where the top level names are thedata sources:
gbif
inat
ebird
vertnet
idigbio
obis
ala
Note that you only get data back for sources that were specified in the from
parameter. All others are present, but empty.
Then within each data source is an object of class occdatind
holding anothernamed list that contains:
meta: metadata
source: the data source name (e.g., "gbif")
time: time the request was sent
found: number of records found (number found across all queries)
returned: number of records returned (number of rows in all data.frame'sin the
data
slot)type: query type, only "sci" for scientific
opts: a named list with the options you sent to the data source
errors: a character vector of errors returned, if any occurred
data: named list of data.frame's, named by the queries sent
Arguments
(character) One to many scientific names. See Details for what parameterin each data source we query. Note: ebird now expects species codes instead ofscientific names - we pass you name through (character) Data source to get data from, any combination of gbif,inat, ebird, vertnet, idigbio, obis, or ala. See (numeric) Number of records to return. This is passed across all sources.To specify different limits for each source, use the options for each source (gbifopts,inatopts, and ebirdopts). See Details for more.Default: 500 for each source. BEWARE: if you have a lot of species to query for (e.g.,n = 10), that's 10 * 500 = 5000, which can take a while to collect. So, when you first query,set the limit to something smallish so that you can get a result quickly, then do more asneeded. (integer) Record to start at or page to start at. See (character or nmeric) One of a Well Known Text (WKT) object, a vector oflength 4 specifying a bounding box, or an sf object (sfg, sfc, or sf). This parametersearches for occurrences inside apolygon - converted to a polygon from whatever user input is given. A WKT shape written as (logical) Only return occurrences that have lat/long data. This worksfor gbif, rinat, idigbio, and vertnet, but is ignored for ebird.You can easily though remove records without lat/long data. Taxonomic identifiers. This can be a list of length 1 to many. See examples forusage. Currently, identifiers for only 'gbif' for parameter 'from' supported. Ifthis parameter is used, query parameter can not be used - if it is, a warning is thrown. (character/Date) A length 2 vector containing two dates of the formYYY-MM-DD. These can be character of Date class. These are used to do a date range search.Of course there are other types of date searches one may want to do but date rangeseems like the most common date search use case. Options passed on to crul::HttpClient, e.g.,for debugging curl calls, setting timeouts, etc. (list) List of named options to pass on to (list) List of named options to pass on to internal function (list) List of named options to pass on torebird::species_code()
internallyvignette(topic = 'spocc introduction')
for more details about these sources.Paging
inDetails for how these parameters are used internally. OptionalPOLYGON((30.1 10.1, 20 40, 40 40, 30.1 10.1))
would be queried as is,i.e. http://bit.ly/HwUSif. See Details for more examples of WKT objects. The format of abounding box is min-longitude, min-latitude, max-longitude, max-latitude
. Geometryis not possible with vertnet right now, but should be soon. See Details for more infoon geometry inputs.rgbif::occ_search()
. See also occ_options()
get_inat_obs
rebird::ebirdregion()
or rebird::ebirdgeo()
. See also occ_options()
(list) List of named options to pass on torvertnet::searchbyterm()
. See also occ_options()
.
(list) List of named options to pass on toridigbio::idig_search_records()
. See also occ_options()
.
(list) List of named options to pass on to internal function.See https://api.obis.org/#/Occurrence/get_occurrence and obis_search forwhat parameters can be used.
(list) List of named options to pass on to internal function.
(logical) occ()
collects errors returned from eachdata provider when they occur, and are accessible in the $meta$errors
slotfor each data provider. If you set throw_warnings=TRUE
, we give theserequest errors as warnings with warning()
. if FALSE
, we don't give warnings,but you can still access them in the output.
Inputs
All inputs to scientific name taxonomic id geometry as bounds, WKT, os Spatial classes To search by common name, first use occ
are one of:occ_names()
to find scientic names ortaxonomic IDs, then feed those to this function. Or use the taxize
packageto get names and/or IDs to use here.
Using the query parameter
When you use the rgbif - rebird - rvertnet - ridigbio - inat - internal function - API parameter: obis - internal function - API parameter: ala - internal function - API parameter: If you have questions about how each of those parameters behaves with respect tothe terms you pass to it, lookup documentation for those functions, or get in touchat the development repository https://github.com/ropensci/spocc/issuesquery
parameter, we pass your search terms on to parameterswithin functions that query data sources you specify. Those parameters are:scientificName
in the rgbif::occ_search()
function - APIparameter: same as the occ
parameterspecies
in the rebird::ebirdregion()
orrebird::ebirdgeo()
functions, depending on whether you setmethod="ebirdregion"
or method="ebirdgeo"
- API parameters: sci
for bothrebird::ebirdregion()
and rebird::ebirdgeo()
taxon
in the rvertnet::vertsearch()
function - APIparameter: q
scientificname
in the ridigbio::idig_search_records()
function - API parameter: scientificname
q
scientificName
q
iDigBio notes
When searching iDigBio note that by deafult we set Maximum of 100,000 results are allowed to be returned. Seehttps://github.com/iDigBio/ridigbio/issues/33fields = "all"
, so that we returna richer suite of fields than the ridigbio
R client gives by default. But you canchanges this by passing in a fields
parameter to idigbioopts
parameter withthe specific fields you want.
iNaturalist notes
We're using the iNaturalist API, docs athttps://api.inaturalist.org/v1/docs/#!/Observations/get_observations API rate limits: max of 100 requests per minute, though they ask that you try to keep itto 60 requests per minute or lower. If they notice usage that has serious impact on theirperformance they may institute blocks without notification. There is a hard limit 0f 10,000 observations with the iNaturalist API. We do paginginternally so you may not see this aspect, but for example, if you request 12,000records, you won't be able to get that many. The API will error at anything more than10,000. We now error if you request more than 10,000 from iNaturalist. There aresome alternatives: Consider exporting data while logged into your iNaturalist account, or the iNaturalist research grade observations withinGBIF - see https://www.gbif.org/dataset/50c9509d-22c7-4a22-a47d-8c48425ef4a7 - attime of this writing it has 8.5 million observations. Search for iNaturalist data within GBIF. e.g., the following searches for iNaturalistdata within GBIF and allows more than 10,000 records:``
limit parameter
The limit
parameter is set to a default of 500. This means that you will get up to500 results back for each data source you ask for data from. If there are no results for aparticular source, you'll get zero back; if there are 8 results for a particular source, you'llget 8 back. If there are 501 results for a particular source, you'll get 500 back. You can alwaysask for more or less back by setting the limit parameter to any number. If you want to requesta different number for each source, pass the appropriate parameter to each data source via therespective options parameter for each data source.
WKT
WKT objects are strings of pairs of lat/long coordinates that define a shape. Many classesof shapes are supported, including POLYGON, POINT, and MULTIPOLYGON. Within each defined shapedefine all vertices of the shape with a coordinate like 30.1 10.1, the first of which is thelatitude, the second the longitude. Examples of valid WKT objects: 'POLYGON((30.1 10.1, 10 20, 20 60, 60 60, 30.1 10.1))' 'POINT((30.1 10.1))' 'LINESTRING(3 4,10 50,20 25)' 'MULTIPOINT((3.5 5.6),(4.8 10.5))")' 'MULTILINESTRING((3 4,10 50,20 25),(-5 -8,-10 -8,-15 -4))' 'MULTIPOLYGON(((1 1,5 1,5 5,1 5,1 1),(2 2,2 3,3 3,3 2,2 2)),((6 3,9 2,9 4,6 3)))' 'GEOMETRYCOLLECTION(POINT(4 6),LINESTRING(4 6,7 10))' Only POLYGON objects are currently supported. Getting WKT polygons or bounding boxes. We will soon introduce a function to help you selecta bounding box but for now, you can use a few sites on the web. Bounding box - https://boundingbox.klokantech.com/ Well known text - http://arthur-e.github.io/Wicket/sandbox-gmaps3.html
geometry parameter
The behavior of the geometry (single), no query - If a single bounding box/WKT string passed in,and no query, a single query is made against each data source. geometry (many), no query - If many bounding boxes/WKT strings are passed in,we do a separate query for each bounding box/WKT string against each data source. geometry (single), query - If a single bounding box/WKT string passed in,and a single query, we do a single query against each data source. geometry (many), query - If many bounding boxes/WKT strings are passed in,and a single query, we do a separate query for each bounding box/WKT string with thesame queried name against each data source. geometry (single), many query - If a single bounding box/WKT string passed in,and many names to query, we do a separate query for each name, using the same geometry,for each data source. geometry (many), many query - If many bounding boxes/WKT strings are passed in,and many names to query, this poses a problem for all data sources, none of whichaccept many bounding boxes of WKT strings. So, in this scenario, we loop over eachname and each geometry query, and then re-combine by queried name, so that you getback a single group of data for each name.occ
function with respect to the geometry
parametervaries depending on the inputs to the query
parameter. Here are the options:
Geometry options by data provider
wkt & bbox allowed, see WKT section above gbif obis ala bbox only inat idigbio No spatial search allowed ebird vertnet
Notes on the date parameter
Date searches with the Notes on some special cases idigbio: We search on the vertnet: If you want more flexible date searches, you can pass varioustypes of date searches to ala: There's some issues with the dates returned from ALA. They arereturned as time stamps, and some seem to be malformed. So do bewareof using ALA dates for important things. Get in touch if you have other date search use cases you thinkare widely usefuldate
parameter are allowed for all sourcesexcept ebird.datecollected
field. Other date fields can besearched on, but we chose datecollected
as it seemed most appropriate.vertnetopts
. See rvertnet::searchbyterm()
for more information
Paging
All data sources respond to the Data sources, however, vary as to whether they respond to an offset. Here'sthe details on which data sources will respond to gbif - Responds to inat - Responds to ebird - No paging, both vertnet - No paging implemented here, both idigbio - Responds to obis - Does not respond to ala - Responds to limit
parameter passed to occ
.start
and whichto the page
parameter:start
. Default: 0page
. Default: 1start
and page
ignored.start
and page
ignored. VertNet does have a form of paging, but it uses a cursor, and can'teasily be included here via parameters. However, rvertnet
does paginginternally for you. For example, the max records per request for VertNet is1000; if you request 2000 records, we'll do the first request, and do thesecond request for you automatically.start
. Default: 0start
. They only allow a starting occurrenceUUID up to which to skip. So order of results matters a great deal of course.To paginate with OBIS, do e.g.obisopts = list(after = "017b7818-5b2c-4c88-9d76-f4471afe5584")
; after
canbe combined with the limit
value you pass in to the main occ()
functioncall. See obis_search for what parameters can be used.start
. Default: 0
Photographs
The iNaturalist data source provides photographs of the records returned,if available. For example, the following will give photos from inat:occ(query = 'Danaus plexippus', from = 'inat')$inat$data$Danaus_plexippus$photos
BEWARE
In cases where you request data from multiple providers, especially whenincluding GBIF, there could be duplicate records since many providers' data eventuallyends up with GBIF. See spocc_duplicates()
for more.
Details
The occ
function is an opinionated wrapperaround the rgbif, rinat, rebird, rvertnet andridigbio packages (as well as internal custom wrappers around some datasources) to allow data access from a single access point. We takecare of making sure you get useful objects out at the cost offlexibility/options - although you can still set options for each of thepackages via the gbifopts, inatopts, etc. parameters.
See Also
Other queries: occ_names()
,occ_names_options()
,occ_options()
,spocc_objects
Examples
if (FALSE) {# Single data sources(res <- occ(query = 'Accipiter striatus', from = 'gbif', limit = 5))res$gbif(res <- occ(query = 'Accipiter striatus', from = 'ebird', limit = 50))res$ebird(res <- occ(query = 'Danaus plexippus', from = 'inat', limit = 50, has_coords = TRUE))res$inatres$inat$datadata.table::rbindlist(res$inat$data$Danaus_plexippus$photos)(res <- occ(query = 'Bison bison', from = 'vertnet', limit = 5))res$vertnetres$vertnet$data$Bison_bisonocc2df(res)# Pagingone <- occ(query = 'Accipiter striatus', from = 'gbif', limit = 5)two <- occ(query = 'Accipiter striatus', from = 'gbif', limit = 5, start = 5)one$gbiftwo$gbif# iNaturalist limits: they allow at most 10,000; query through GBIF to get# more than 10,000# See https://www.gbif.org/dataset/50c9509d-22c7-4a22-a47d-8c48425ef4a7# x <- occ(query = 'Danaus plexippus', from = 'gbif', limit = 10100, # gbifopts = list(datasetKey = "50c9509d-22c7-4a22-a47d-8c48425ef4a7"))# x$gbif# Date range searches across data sources## Not possible for ebird## alaocc(date = c('2018-01-01T00:00:00Z', '2018-03-28T00:00:00Z'), from = 'ala', limit = 5)## gbifocc(query = 'Accipiter striatus', date = c('2010-08-01', '2010-08-31'), from = 'gbif', limit=5)## vertnetocc(query = 'Mustela nigripes', date = c('1990-01-01', '2015-12-31'), from = 'vertnet', limit=5)## idigbioocc(query = 'Acer', date = c('2010-01-01', '2015-12-31'), from = 'idigbio', limit=5)## obisocc(query = 'Mola mola', date = c('2015-01-01', '2015-12-31'), from = 'obis', limit=5)## inatocc(query = 'Danaus plexippus', date = c('2015-01-01', '2015-12-31'), from = 'inat', limit=5)# Restrict to records with coordinatesocc(query = "Acer", from = "idigbio", limit = 5, has_coords = TRUE)occ(query = 'Setophaga caerulescens', from = 'ebird', ebirdopts = list(loc='US'))occ(query = 'Spinus tristis', from = 'ebird', ebirdopts = list(method = 'ebirdgeo', lat = 42, lng = -76, dist = 50))# idigbio data## scientific name searchocc(query = "Acer", from = "idigbio", limit = 5)occ(query = "Acer", from = "idigbio", idigbioopts = list(offset = 5, limit = 3))## geo searchbounds <- c(-120, 40, -100, 45)occ(from = "idigbio", geometry = bounds, limit = 10)## just class arachnida, spidersocc(idigbioopts = list(rq = list(class = 'arachnida')), from = "idigbio", limit = 10)## search certain recordsetssets <- c("1ffce054-8e3e-4209-9ff4-c26fa6c24c2f", "8dc14464-57b3-423e-8cb0-950ab8f36b6f", "26f7cbde-fbcb-4500-80a9-a99daa0ead9d")occ(idigbioopts = list(rq = list(recordset = sets)), from = "idigbio", limit = 10)# Many data sources(out <- occ(query = 'Pinus contorta', from=c('gbif','vertnet'), limit=10))## Select individual elementsout$gbifout$gbif$dataout$vertnet## Coerce to combined data.frame, selects minimal set of## columns (name, lat, long, provider, date, occurrence key)occ2df(out)# Pass in limit parameter to all sources. This limits the number of occurrences# returned to 10, in this example, for all sources, in this case gbif and inat.occ(query='Pinus contorta', from=c('gbif','inat'), limit=10)# Geometry## Pass in geometry parameter to all sources. This constraints the search to the## specified polygon for all sources, gbif in this example.## Check out http://arthur-e.github.io/Wicket/sandbox-gmaps3.html to get a WKT stringocc(query='Accipiter', from='gbif', geometry='POLYGON((30.1 10.1, 10 20, 20 60, 60 60, 30.1 10.1))')## Or pass in a bounding box, which is automatically converted to WKT (required by GBIF)## via the bbox2wkt function. The format of a bounding box is## [min-longitude, min-latitude, max-longitude, max-latitude].occ(query='Accipiter striatus', from='gbif', geometry=c(-125.0,38.4,-121.8,40.9))## lots of results, can see how many by indexing to metares <- occ(query='Accipiter striatus', from='gbif', geometry='POLYGON((-69.9 49.2,-69.9 29.0,-123.3 29.0,-123.3 49.2,-69.9 49.2))')res$gbif## You can pass in geometry to each source separately via their opts parameter, at## least those that support it. Note that if you use rinat, you reverse the order, with## latitude first, and longitude second, but here it's the reverse for consistency across## the spocc packagebounds <- c(-125.0,38.4,-121.8,40.9)occ(query = 'Danaus plexippus', from="inat", geometry=bounds)## Passing geometry with multiple sourcesocc(query = 'Danaus plexippus', from=c("inat","gbif"), geometry=bounds)## Using geometry only for the query### A single bounding boxocc(geometry = bounds, from = "gbif", limit=50)### Many bounding boxesocc(geometry = list(c(-125.0,38.4,-121.8,40.9), c(-115.0,22.4,-111.8,30.9)), from = "gbif")## Geometry only with WKTwkt <- 'POLYGON((-98.9 44.2,-89.1 36.6,-116.7 37.5,-102.5 39.6,-98.9 44.2))'occ(from = "gbif", geometry = wkt, limit = 10)# Specify many data sources, another exampleebirdopts = list(loc = 'US'); gbifopts = list(country = 'US')out <- occ(query = 'Setophaga caerulescens', from = c('gbif','inat','ebird'), gbifopts = gbifopts, ebirdopts = ebirdopts, limit=20)occ2df(out)# Pass in many species names, combine just data to a single data.frame, and# first six rowsspnames <- c('Accipiter striatus', 'Setophaga caerulescens', 'Spinus tristis')(out <- occ(query = spnames, from = 'gbif', gbifopts = list(hasCoordinate = TRUE), limit=25))df <- occ2df(out)head(df)# no query, geometry, or ids passed## many dataset keys to gbifdsets <- c("14f3151a-e95d-493c-a40d-d9938ef62954", "f934f8e2-32ca-46a7-b2f8-b032a4740454")occ(limit = 20, from = "gbif", gbifopts = list(datasetKey = dsets))## class name to idigbioocc(limit = 20, from = "idigbio", idigbioopts = list(rq = list(class = 'arachnida')))# taxize integration## You can pass in taxonomic identifierslibrary("taxize")(ids <- get_ids(c("Chironomus riparius","Pinus contorta"), db = c('itis','gbif')))occ(ids = ids, from='gbif', limit=20)(ids <- get_ids("Chironomus riparius", db = 'gbif'))occ(ids = ids, from='gbif', limit=20)(ids <- get_gbifid("Chironomus riparius"))occ(ids = ids, from='gbif', limit=20)## sf classeslibrary("sp")library("sf")one <- Polygon(cbind(c(91,90,90,91), c(30,30,32,30)))spone = Polygons(list(one), "s1")sppoly = SpatialPolygons(list(spone), as.integer(1))## single polygon in a sf classx <- st_as_sf(sppoly)out <- occ(geometry = x, limit=50)out$gbif$datamapr::map_leaflet(out)## single polygon in a sfc classx <- st_as_sf(sppoly)out <- occ(geometry = x[[1]], limit=50)out$gbif$data## single polygon in a sf POLYGON classx <- st_as_sf(sppoly)x <- unclass(x[[1]])[[1]]class(x)out <- occ(geometry = x, limit=50)out$gbif$data## two polygons in an sf classone <- Polygon(cbind(c(-121.0,-117.9,-121.0,-121.0), c(39.4, 37.1, 35.1, 39.4)))two <- Polygon(cbind(c(-123.0,-121.2,-122.3,-124.5,-123.5,-124.1,-123.0), c(44.8,42.9,41.9,42.6,43.3,44.3,44.8)))spone = Polygons(list(one), "s1")sptwo = Polygons(list(two), "s2")sppoly = SpatialPolygons(list(spone, sptwo), 1:2)sppoly_df <- SpatialPolygonsDataFrame(sppoly, data.frame(a=c(1,2), b=c("a","b"), c=c(TRUE,FALSE), row.names=row.names(sppoly)))x <- st_as_sf(sppoly_df)out <- occ(geometry = x, limit=50)out$gbif$data# curl debuggingocc(query = 'Accipiter striatus', from = 'gbif', limit=10, callopts=list(verbose = TRUE))occ(query = 'Accipiter striatus', from = 'inat', callopts=list(verbose = TRUE))occ(query = 'Mola mola', from = 'obis', limit = 200, callopts = list(verbose = TRUE))########## More thorough data source specific examples# idigbio## scientific name searchres <- occ(query = "Acer", from = "idigbio", limit = 5)res$idigbio## geo search### bounding boxbounds <- c(-120, 40, -100, 45)occ(from = "idigbio", geometry = bounds, limit = 10)### wkt# wkt <- 'POLYGON((-69.9 49.2,-69.9 29.0,-123.3 29.0,-123.3 49.2,-69.9 49.2))'wkt <- 'POLYGON((-98.9 44.2,-89.1 36.6,-116.7 37.5,-102.5 39.6,-98.9 44.2))'occ(from = "idigbio", geometry = wkt, limit = 10)## limit fields returnedocc(query = "Acer", from = "idigbio", limit = 5, idigbioopts = list(fields = "scientificname"))## offset and max_itemsocc(query = "Acer", from = "idigbio", limit = 5, idigbioopts = list(offset = 10))## sortocc(query = "Acer", from = "idigbio", limit = 5, idigbioopts = list(sort = TRUE))$idigbioocc(query = "Acer", from = "idigbio", limit = 5, idigbioopts = list(sort = FALSE))$idigbio## more complex queries### parameters passed to "rq", get combined with the name queriedocc(query = "Acer", from = "idigbio", limit = 5, idigbioopts = list(rq = list(basisofrecord="fossilspecimen")))$idigbio#### NOTE: no support for multipolygons yet## WKT's are more flexible than bounding box's. You can pass in a WKT with multiple## polygons like so (you can use POLYGON or MULTIPOLYGON) when specifying more than one## polygon. Note how each polygon is in it's own set of parentheses.# occ(query='Accipiter striatus', from='gbif',# geometry='MULTIPOLYGON((30 10, 10 20, 20 60, 60 60, 30 10),# (30 10, 10 20, 20 60, 60 60, 30 10))')# OBIS examples## basic query(res <- occ(query = 'Mola mola', from = 'obis', limit = 200))## get to obis datares$obis## get obis + gbif data(res <- occ(query = 'Mola mola', from = c('obis', 'gbif'), limit = 200))res$gbifres$obis## no match found(res <- occ(query = 'Linguimaera thomsonia', from = 'obis'))## geometry querygeometry <- "POLYGON((8.98 48.05,15.66 48.05,15.66 45.40,8.98 45.40,8.98 48.05))"(res <- occ(from = 'obis', geometry = geometry, limit = 50))res$obis## Pass in spatial classes## sp classes no longer supported## Paging(res1 <- occ(query = 'Mola mola', from = 'obis', limit = 10))occ_ids <- res1$obis$data$Mola_mola$id(res2 <- occ(query = 'Mola mola', from = 'obis', limit = 10, obisopts = list(after = occ_ids[length(occ_ids)])))res1$obisres2$obis## Pass in any parameters to obisopts as a list(res <- occ(query = 'Mola mola', from = 'obis', obisopts = list(startdepth = 40, enddepth = 50)))min(res$obis$data$Mola_mola$minimumDepthInMeters, na.rm=TRUE)max(res$obis$data$Mola_mola$maximumDepthInMeters, na.rm=TRUE)# ALA examples## basic query(res <- occ(query = 'Alaba vibex', from = 'ala', limit = 200))## get to ala datares$alaocc2df(res)# geometry search(x <- occ(query = "Macropus", from = 'ala', geometry = "POLYGON((145 -37,150 -37,150 -30,145 -30,145 -37))"))x$alaocc2df(x)}
Run the code above in your browser using DataLab