Data requests using an ERDDAP URL
Overview
Teaching: 30 min
Exercises: 10 minQuestions
What is ERDDAP REST API?
How does an ERDDAP URL endpoint look like?
How can I access data request?
Objectives
Creating the ERDDAP download URL
Downloading an ERRDAP table dataset with Python
We have just seen that we can search erddap for datasets. The main takeaways were:
-
ERDDAP is a tool and many repositories can have an ERDDAP server, which means that there are many erddap servers around.
- You can search 2 types of data, also called protocols: tabledap & griddap
- A dataset in ERDDAP can be downloaded in many different file types, based on what you need.
- You can subset a dataset based on constraining the variables.
In the next chapter, we’ll see that an ERDDAP can not only be used in the web interface like we did, but also as a URL that computer programs can use (in this case, to get data, graphs, and information about datasets).
The ERDDAP REST API Service
What is an API?
Shoutout to API workshop from Amber York:
API stands for Application Programming Interface. APIs are the glue that hold the technology universe together. They are a communication tool that can be used to pass information to and from different kinds of devices and hardware through requests and responses. Let’s look at a restaurant concept example.
Concept: Restaurant as an API
- You request an item listed the menu.
- Your order is received by the kitchen.
- The kitchen performs all kinds of operations needed to make your food.
- Then then you get a response. Your food is either delivered to your table or you get no food if they couldn’t make your request. They can’t make your food if you ordered something that isn’t on the menu, you didn’t ask for it correctly, or if equipment failure prevented them from making your requested item.
API requests have to be made in specific ways so referring to Documentation is very important.
In this workshop we are focussing on the ERDDAP REST API. It is a medium for two computers to communicate over HTTP (Hypertext Transfer Protocol), in the same way clients and servers communicate.
ERDDAP request URLs
Requesting data from ERDDAP can be done using a URL. All information about every ERDAPP request is contained in the URL of each request, which makes it easy to automate searching for and using data in other applications.
Tabledap request URLs must be in the form
- server/protocol/datasetID.fileType{?query }
- https://coastwatch.pfeg.noaa.gov/erddap/tabledap/pmelTaoDySst.htmlTable?longitude,latitude,time,station,wmo_platform_code,T_25&time%3E=2015-05-23T12:00:00Z&time%3C=2015-05-31T12:00:00Z
Thus, the query is often a comma-separated list of desired variable names, followed by a collection of constraints (e.g., variable<value), each preceded by ‘&’ (which is interpreted as “AND”).
Details:
-
Requests must not have any internal spaces.
-
Requests are case sensitive.
-
datasetID identifies the name that ERDDAP assigned to the source web site and dataset (for example, bcodmo_dataset_786013). You can see a list of datasetID options available via tabledap.
Building the URL of a dataset
Let’s manually create an ERDDAP URL to request a dataset with specific variables: open up notepad (++) or TextEdit
Links to the example dataset we will be using:
- BCO-DMO landing page: https://www.bco-dmo.org/dataset/815732
- ERDDAP page of dataset: https://erddap.bco-dmo.org/erddap/info/bcodmo_dataset_783911/index.html
1. ERDDAP server you want to get data
BCO-DMO ERDDAP server: https://erddap.bco-dmo.org/erddap
2. Protocol
Protocols are the standards which specify how to request data. Different protocols are appropriate for different types of data and for different client applications.
tabledap lets you request a data subset, a graph, or a map from a tabular dataset (for example, buoy data), via a specially formed URL
griddap lets you request a data subset, graph, or map from a gridded dataset (for example, sea surface temperature data from a satellite), via a specially formed URL
3. Dataset ID
The dataset id of our dataset previously used was: bcodmo_dataset_783911
4. Choose your file type
Specifies the type of table data file that you want to download. You can use different filetype based on your specific needs and your community, you can download files for matlab binary file, netcdf, .csv (for R and Python), GIS, etc. The column below gives all the formats available:
Data fileTypes | Description |
---|---|
.asc | View OPeNDAP-style ISO-8859-1 comma-separated text. |
.csv | Download a ISO-8859-1 comma-separated text table (line 1: names; line 2: units; ISO 8601 times). |
.csvp | Download a ISO-8859-1 .csv file with line 1: name (units). Times are ISO 8601 strings. |
.csv0 | Download a ISO-8859-1 .csv file without column names or units. Times are ISO 8601 strings. |
.dataTable | A JSON file formatted for use with the Google Visualization client library (Google Charts). |
.das | View the dataset’s metadata via an ISO-8859-1 OPeNDAP Dataset Attribute Structure (DAS). |
.dds | View the dataset’s structure via an ISO-8859-1 OPeNDAP Dataset Descriptor Structure (DDS). |
.dods | OPeNDAP clients use this to download the data in the DODS binary format. |
.esriCsv | Download a ISO_8859_1 .csv file for ESRI’s ArcGIS 9.x and below (separate date and time columns). |
.fgdc | View the dataset’s UTF-8 FGDC .xml metadata. |
.geoJson | Download longitude,latitude,otherColumns data as a UTF-8 GeoJSON .json file. |
.graph | View a Make A Graph web page. |
.help | View a web page with a description of tabledap. |
.html | View an OPeNDAP-style HTML Data Access Form. |
.htmlTable | View a UTF-8 .html web page with the data in a table. Times are ISO 8601 strings. |
.iso19115 | View the dataset’s ISO 19115-2/19139 UTF-8 .xml metadata. |
.itx | Download an ISO-8859-1 Igor Text File. Each response column becomes a wave. |
.json | View a table-like UTF-8 JSON file (missing value = ‘null’; times are ISO 8601 strings). |
.jsonlCSV1 | View a UTF-8 JSON Lines CSV file with column names on line 1 (mv = ‘null’; times are ISO 8601 strings). |
.jsonlCSV | View a UTF-8 JSON Lines CSV file without column names (mv = ‘null’; times are ISO 8601 strings). |
.jsonlKVP | View a UTF-8 JSON Lines file with Key:Value pairs (missing value = ‘null’; times are ISO 8601 strings). |
.mat | Download a MATLAB binary file. |
.nc | Download a flat, table-like, NetCDF-3 binary file with COARDS/CF/ACDD metadata. |
.ncHeader | View the UTF-8 header (the metadata) for the NetCDF-3 .nc file. |
.ncCF | Download a NetCDF-3 CF Discrete Sampling Geometries file (Contiguous Ragged Array). |
.ncCFHeader | View the UTF-8 header (the metadata) for the .ncCF file. |
.ncCFMA | Download a NetCDF-3 CF Discrete Sampling Geometries file (Multidimensional Array). |
.ncCFMAHeader | View the UTF-8 header (the metadata) for the .ncCFMA file. |
.nccsv | Download a NetCDF-3-like 7-bit ASCII NCCSV .csv file with COARDS/CF/ACDD metadata. |
.nccsvMetadata | View the dataset’s metadata as the top half of a 7-bit ASCII NCCSV .csv file. |
.ncoJson | Download a UTF-8 NCO lvl=2 JSON file with COARDS/CF/ACDD metadata. |
.odvTxt | Download longitude,latitude,time,otherColumns as an ISO-8859-1 ODV Generic Spreadsheet File (.txt). |
.subset | View an HTML form which uses faceted search to simplify picking subsets of the data. |
.tsv | Download a ISO-8859-1 tab-separated text table (line 1: names; line 2: units; ISO 8601 times). |
.tsvp | Download a ISO-8859-1 .tsv file with line 1: name (units). Times are ISO 8601 strings. |
.tsv0 | Download a ISO-8859-1 .tsv file without column names or units. Times are ISO 8601 strings. |
.wav | Download a .wav audio file. All columns must be numeric and of the same type. |
.xhtml | View a UTF-8 XHTML (XML) file with the data in a table. Times are ISO 8601 strings. |
In this lesson we will use the .csv file type to import it in Jupyter Notebook and work with the Pandas library.
IMPORTANT NOTE: The actual extension of the resulting file may be slightly different than the fileType (for example, .htmlTable returns an .html file). To get a .csv file with the header names in the first row the file type is .csvp.
5. Data request
-
The data request in the URL starts with
?
-
Then add the variables of interest: all variables
-
The query is a comma-separated list of desired variable names, followed by a collection of constraints (e.g., variable<value), each preceded by ‘&’ (which is interpreted as “AND”).
Variables that can be added to your URL: https://erddap.bco-dmo.org/erddap/info/bcodmo_dataset_783911/index.html -> it is case sensitive!
We want the following variables: Station, time, Temperature (between 0 and 2), latitude, longitude
6. URL outcome
https://erddap.bco-dmo.org/erddap/tabledap/bcodmo_dataset_783911.htmlTable?Station,time,Temperature,latitude,longitude&Temperature>=0&Temperature<=2
Note: you can also get the URL using the ERDDAP web interface by clicking the “Just generate the URL” button: https://erddap.bco-dmo.org/erddap/tabledap/bcodmo_dataset_783911.htmlTable?Station%2Ctime%2CTemperature%2Clatitude%2Clongitude&Temperature%3E=0&Temperature%3C=2
Access
Web browser
What happens when you copy and past that URL into a web browser?
Command line
You can use curl
on command line/Terminal to make requests and get responses.
$ curl https://erddap.bco-dmo.org/erddap/tabledap/bcodmo_dataset_783911.htmlTable?Station,time,Temperature,latitude,longitude&Temperature>=0&Temperature<=2
Exercise:
- Add the Fluorescence variable to your URL request
- Constrain it to value between 0 and 40
Download the dataset with Python
In the previous chapter we saw that we could download a dataset from erddap pushing a button.
Live coding demo
Link to static Jupyter Notebook. Copy/Paste the code blocks into your own Jupyter Notebook
Opening a Jupyter Notebook on your own computer:
- Open anaconda prompt or terminal
- Activate erddap environment
conda activate erddap
- Go to the location where you want your notebook to be:
dir
orls
andcd
commands are useful - jupyter notebook in command line
- go to new -> Python 3
We just build the following URL in an exercise:
https://erddap.bco-dmo.org/erddap/tabledap/bcodmo_dataset_783911.htmlTable?Station,time,Temperature,latitude,longitude&Temperature>=0&Temperature<=2
Let’s now download it with python to our computer locally.
The urllib.request library helps with opening URLs. It is part of the erddapy package that you have installed. Import this package into the Python environment to work with url’s in your environment
# Import the urllib library
import urllib.request
Some characters cannot be part of a URL, like ‘,’and ‘>’. When pasting a URL with comma’s into your webbrowser it automatically translates those carachers into a different encoding before transmission.
#define the url you want to download
download_url = "https://erddap.bco-dmo.org/erddap/tabledap/bcodmo_dataset_783911.csv?Station,time,Temperature,latitude,longitude&Temperature%3E=0&Temperature%3C=2"
# Define where you want to save the file on your computer
name_to_save = "bcodmo_dataset_815732.csv"
# download the dataset
urllib.request.urlretrieve(download_url, name_to_save)
Import the Pandas library to work with tables in your Python environment.
# Import the downloaded .csv data into jupyter notebooks with the package Pandas
import pandas as pd
pd.read_csv ("bcodmo_dataset_815732.csv")
dataframe = pd.read_csv ("bcodmo_dataset_815732.csv", dtype='unicode')
print (dataframe)
Key Points
Tabledap request URLs are in the form: server/protocol/datasetID.fileType{?query}
urllib library works with https protocols