Open Data & ERDDAP

Overview

Teaching: 10 min
Exercises: 0 min

Questions

What is open data?

What is ERDDAP?

Why is ERDDAP important for data reuse?

Objectives

Understand all the different factors for reusing online data with ERDDAP

Open Data

Open data = Documenting and sharing research data openly for re-use. Data sharing benefits scientific advancement by promoting transparency, encouraging collaboration, accelerating research and driving better decision-making.

Accordingly, there is an ongoing global data revolution that seeks to advance collaboration and the creation and expansion of effective, efficient research programs. When applying for grants nowadays, it is often required to share your data with the public:

NSF-OCE: Sample and Data Policy NSF Division of Ocean Sciences: “PIs are expected to share with other researchers and the public, …, the data, samples, physical collections, and other supporting materials created or gathered in the course of work under NSF grants.”
Ocean Observatory Initiative: OOI Data Policy: In principle, all OOI data will be made publicly available, free of charge, to anyone.
Biogeochemical Argo: data management rules: data are made publicly available

When making your data freely available, it is important that end-users reusing data have all the knowledge necessary to be able to trust and understand the data they want to re-use. End-users can be both humans and computers. Metrics to see if a package is truly “Open Data” are the F.A.I.R principles.

Repositories are here to make the journey to open data easier: juggling data principles and policies, funding requirements, publication specifications, research specifics, archiving and discovery through online search engines. Repository types range from general repositories, which curate heterogeneous types of data, to Institutional repositories who are more familiar with the research at the institution to domain specific repositories (such as BCO-DMO). Domain-specific repositories have the role to make sure the data they receive have the correct domain- specific, standardized metadata and make them publicly available.

So in short, the data life cycle follows this pattern: Data acquisition & analysis -> Data publication & preservation -> Data Reuse (multiple researchers)

Aligning data sources

Once you have made your data online available for people to re-use it, there can often still be barriers that stand in the way of easily doing so. Reusing data from another source is difficult:

different way of requesting data
different formats: you work with R while colleague is working with Matlab and the other one with python
Need for standardised metadata

This is where ERDDAP comes in. It gives data providers the ability to, in a consistent way, download subsets of gridded and tabular scientific datasets in common file formats and make graphs and maps.

Specific ERDDAP servers

There is no “1 ERDDAP server”, instead organisations and repositories have their own erddap server to distribute data to end users. These users can request data and get data out in various file formats. Many institutes, repo’s and organizations (including NOAA, NASA, and USGS) run ERDDAP servers to serve their data.

Each repository and/or program has its own type of data it is serving. To export data from a repository it is always useful to have a bit of a background of what data the serves contains and how the data structure is. For this workshop, we will use data from the following repositories and programs:

BCO-DMO

Serves data and information from biological, chemical and biogeochemical research conducted in coastal, marine, great lakes and laboratory environments. (Supporting mainly NSF OCE bio & Chem sections)
BCO-DMO ERDDAP Server: https://erddap.bco-dmo.org/erddap/index.html

OOI

The OOI consists of five marine scientific arrays located in the North and South Atlantic and Pacific Oceans delivering real-time data from more than 800 instruments
OOI ERDDAP Server: https://erddap.dataexplorer.oceanobservatories.org/erddap/index.html

Argo

The Global Argo Float program has almost 4000 real-time floats around the world that drift with the ocean currents and move up and down between the surface and a mid-water level.
Unfortunately, because of the complexity and international nature of the program, there isn’t one “perfect” source to retrieve Argo data, or even to search for drifters you may be interested in.
ERDDAP Server: http://www.ifremer.fr/erddap/index.html

Poll

Have you ever made data from a research project available online (either through a repository or the organisation)?

Have you ever reused data from a data provider?

Key Points

Open data is documentation and sharing research data openly for re-use:

Reusing data from another source can be challenging

ERDDAP provides the ability to download data in common file formats :

Finding data in the ERDDAP data catalog

Overview

Teaching: 25 min
Exercises: 10 min

Questions

How do I search for data in ERDDAP?

What information does a dataset hold?

How can I subset a dataset?

How do I make a graph in ERDDAP?

Objectives

Understand all the different factors for reusing online data with ERDDAP

Exploring an ERDDAP data catalog

In the chapter before we have seen that there are many ERDDAP servers to chose from. In this chapter we will dive a bit deeper in how to search for data in an ERDDAP catalog.

Finding data

For this example, we will use the ERDDAP operated by BCO-DMO: go to:

https://erddap.bco-dmo.org/erddap/index.html

To view all the available datasets on this erddap server click “View a List of all 1095 datasets”

Let’s now search the database for specific data. Let’s search for CTD data in the Ross Sea from 2017. Type in search box: ctd “Ross Sea” 2017. This is a full text search, just like Google, use white spaces in between words and double quotes “” around phrases

There are a couple of datasets that are popping up in this search, but let’s choose the one with DatasetID: bcodmo_dataset_783911.

Dataset information

Within the search results you have access to information about each dataset to help you decide with which dataset is useful for your application.

The listing (pictured above) gives access to a lot of information about the dataset. In a browser, try the following:

Mouse over the question mark ? under Summary to get an overview of the dataset.
Click “Background info” to get more complete information from the data provider about the dataset. Now go back to the search results page.
Click the "M" under “ISO,Metadata” to see all of the dataset metadata. A lot of information is displayed. Some important fields are:
- Global attributes (general metadata) vs variable attributes (variable names & units)
- "geospatial_lat_min", "geospatial_lat_max", "geospatial_lon_min", and "geospatial_lon_max" for the spatial coverage
- "references" for citing the dataset in publications
- "license" for restrictions on using the data
- "acknowledgement" often used to describe how to acknowledge use of the dataset
- time: ERDDAP standardizes the dates+times in the results. Data from other data servers is hard to compare because the dates+times often are expressed in different formats (for example, “Jan 2, 2018”, 02-JAN-2018, 1/2/18, 2/1/18, 2018-01-02, “days since Jan 1, 1900”). For string times, ERDDAP always uses the ISO 8601:2004(E) standard format, for example, 2018-01-02T00:00:00Z. For numeric times, ERDDAP always uses “seconds since 1970-01-01T00:00:00Z”. ERDDAP always uses the Zulu (UTC, GMT) time zone to remove the difficulties of working with different time zones and standard time versus daylight saving time.

These standardised variables are important for the dataset to be able to be “read” by other end-users and machines.

For example Google dataset search:

open google dataset search: https://datasetsearch.research.google.com/
search for the dataset id of the dataset above: bcodmo_dataset_783911

Subsetting data

Click on the data button. Here is the link to the dataset in erddap: https://erddap.bco-dmo.org/erddap/tabledap/bcodmo_dataset_783911.html

Set the file type

Download the data “Submit”

Create a graph

ERRDAP also provides a graph function for your datasets, but I will not go into this more in detail.

Exercise: Inspect BGC-Argo data

Go to the ERDDAP server: https://polarwatch.noaa.gov/erddap/.

search “Biogeochemical-Argo” .

Q: how many datasets are being returned?

Q: What is the difference between the datasets? (Does the title give something away?)

Q: What is the time range the datasets have? (Hint click data tab)

Download a dataset in .csv format that ranges in time from May 1st, 2015 until May 3rd 2015

Key Points

Searching an ERDDAP data catalog can be done using a web page

Data can be downloaded in different file formats

Constraints can be added to a dataset search

Data requests using an ERDDAP URL

Overview

Teaching: 30 min
Exercises: 10 min

Questions

What is ERDDAP REST API?

How does an ERDDAP URL endpoint look like?

How can I access data request?

Objectives

Creating the ERDDAP download URL

Downloading an ERRDAP table dataset with Python

We have just seen that we can search erddap for datasets. The main takeaways were:

ERDDAP is a tool and many repositories can have an ERDDAP server, which means that there are many erddap servers around.
You can search 2 types of data, also called protocols: tabledap & griddap
A dataset in ERDDAP can be downloaded in many different file types, based on what you need.
You can subset a dataset based on constraining the variables.

In the next chapter, we’ll see that an ERDDAP can not only be used in the web interface like we did, but also as a URL that computer programs can use (in this case, to get data, graphs, and information about datasets).

The ERDDAP REST API Service

What is an API?

Shoutout to API workshop from Amber York:

API stands for Application Programming Interface. APIs are the glue that hold the technology universe together. They are a communication tool that can be used to pass information to and from different kinds of devices and hardware through requests and responses. Let’s look at a restaurant concept example.

restaurant as an API

Concept: Restaurant as an API

You request an item listed the menu.

Your order is received by the kitchen.

The kitchen performs all kinds of operations needed to make your food.

Then then you get a response. Your food is either delivered to your table or you get no food if they couldn’t make your request. They can’t make your food if you ordered something that isn’t on the menu, you didn’t ask for it correctly, or if equipment failure prevented them from making your requested item.

API requests have to be made in specific ways so referring to Documentation is very important.

In this workshop we are focussing on the ERDDAP REST API. It is a medium for two computers to communicate over HTTP (Hypertext Transfer Protocol), in the same way clients and servers communicate.

ERDDAP request URLs

Requesting data from ERDDAP can be done using a URL. All information about every ERDAPP request is contained in the URL of each request, which makes it easy to automate searching for and using data in other applications.

Tabledap request URLs must be in the form

server/protocol/datasetID.fileType{?query }
https://coastwatch.pfeg.noaa.gov/erddap/tabledap/pmelTaoDySst.htmlTable?longitude,latitude,time,station,wmo_platform_code,T_25&time%3E=2015-05-23T12:00:00Z&time%3C=2015-05-31T12:00:00Z

Thus, the query is often a comma-separated list of desired variable names, followed by a collection of constraints (e.g., variable<value), each preceded by ‘&’ (which is interpreted as “AND”).

Details:

Requests must not have any internal spaces.
Requests are case sensitive.
datasetID identifies the name that ERDDAP assigned to the source web site and dataset (for example, bcodmo_dataset_786013). You can see a list of datasetID options available via tabledap.

Building the URL of a dataset

Let’s manually create an ERDDAP URL to request a dataset with specific variables: open up notepad (++) or TextEdit

Links to the example dataset we will be using:

BCO-DMO landing page: https://www.bco-dmo.org/dataset/815732
ERDDAP page of dataset: https://erddap.bco-dmo.org/erddap/info/bcodmo_dataset_783911/index.html

1. ERDDAP server you want to get data

BCO-DMO ERDDAP server: https://erddap.bco-dmo.org/erddap

2. Protocol

Protocols are the standards which specify how to request data. Different protocols are appropriate for different types of data and for different client applications.

tabledap lets you request a data subset, a graph, or a map from a tabular dataset (for example, buoy data), via a specially formed URL

griddap lets you request a data subset, graph, or map from a gridded dataset (for example, sea surface temperature data from a satellite), via a specially formed URL

3. Dataset ID

The dataset id of our dataset previously used was: bcodmo_dataset_783911

4. Choose your file type

Specifies the type of table data file that you want to download. You can use different filetype based on your specific needs and your community, you can download files for matlab binary file, netcdf, .csv (for R and Python), GIS, etc. The column below gives all the formats available:

Data fileTypes	Description
.asc	View OPeNDAP-style ISO-8859-1 comma-separated text.
.csv	Download a ISO-8859-1 comma-separated text table (line 1: names; line 2: units; ISO 8601 times).
.csvp	Download a ISO-8859-1 .csv file with line 1: name (units). Times are ISO 8601 strings.
.csv0	Download a ISO-8859-1 .csv file without column names or units. Times are ISO 8601 strings.
.dataTable	A JSON file formatted for use with the Google Visualization client library (Google Charts).
.das	View the dataset’s metadata via an ISO-8859-1 OPeNDAP Dataset Attribute Structure (DAS).
.dds	View the dataset’s structure via an ISO-8859-1 OPeNDAP Dataset Descriptor Structure (DDS).
.dods	OPeNDAP clients use this to download the data in the DODS binary format.
.esriCsv	Download a ISO_8859_1 .csv file for ESRI’s ArcGIS 9.x and below (separate date and time columns).
.fgdc	View the dataset’s UTF-8 FGDC .xml metadata.
.geoJson	Download longitude,latitude,otherColumns data as a UTF-8 GeoJSON .json file.
.graph	View a Make A Graph web page.
.help	View a web page with a description of tabledap.
.html	View an OPeNDAP-style HTML Data Access Form.
.htmlTable	View a UTF-8 .html web page with the data in a table. Times are ISO 8601 strings.
.iso19115	View the dataset’s ISO 19115-2/19139 UTF-8 .xml metadata.
.itx	Download an ISO-8859-1 Igor Text File. Each response column becomes a wave.
.json	View a table-like UTF-8 JSON file (missing value = ‘null’; times are ISO 8601 strings).
.jsonlCSV1	View a UTF-8 JSON Lines CSV file with column names on line 1 (mv = ‘null’; times are ISO 8601 strings).
.jsonlCSV	View a UTF-8 JSON Lines CSV file without column names (mv = ‘null’; times are ISO 8601 strings).
.jsonlKVP	View a UTF-8 JSON Lines file with Key:Value pairs (missing value = ‘null’; times are ISO 8601 strings).
.mat	Download a MATLAB binary file.
.nc	Download a flat, table-like, NetCDF-3 binary file with COARDS/CF/ACDD metadata.
.ncHeader	View the UTF-8 header (the metadata) for the NetCDF-3 .nc file.
.ncCF	Download a NetCDF-3 CF Discrete Sampling Geometries file (Contiguous Ragged Array).
.ncCFHeader	View the UTF-8 header (the metadata) for the .ncCF file.
.ncCFMA	Download a NetCDF-3 CF Discrete Sampling Geometries file (Multidimensional Array).
.ncCFMAHeader	View the UTF-8 header (the metadata) for the .ncCFMA file.
.nccsv	Download a NetCDF-3-like 7-bit ASCII NCCSV .csv file with COARDS/CF/ACDD metadata.
.nccsvMetadata	View the dataset’s metadata as the top half of a 7-bit ASCII NCCSV .csv file.
.ncoJson	Download a UTF-8 NCO lvl=2 JSON file with COARDS/CF/ACDD metadata.
.odvTxt	Download longitude,latitude,time,otherColumns as an ISO-8859-1 ODV Generic Spreadsheet File (.txt).
.subset	View an HTML form which uses faceted search to simplify picking subsets of the data.
.tsv	Download a ISO-8859-1 tab-separated text table (line 1: names; line 2: units; ISO 8601 times).
.tsvp	Download a ISO-8859-1 .tsv file with line 1: name (units). Times are ISO 8601 strings.
.tsv0	Download a ISO-8859-1 .tsv file without column names or units. Times are ISO 8601 strings.
.wav	Download a .wav audio file. All columns must be numeric and of the same type.
.xhtml	View a UTF-8 XHTML (XML) file with the data in a table. Times are ISO 8601 strings.

In this lesson we will use the .csv file type to import it in Jupyter Notebook and work with the Pandas library.

IMPORTANT NOTE: The actual extension of the resulting file may be slightly different than the fileType (for example, .htmlTable returns an .html file). To get a .csv file with the header names in the first row the file type is .csvp.

5. Data request

The data request in the URL starts with ?
Then add the variables of interest: all variables
The query is a comma-separated list of desired variable names, followed by a collection of constraints (e.g., variable<value), each preceded by ‘&’ (which is interpreted as “AND”).

Variables that can be added to your URL: https://erddap.bco-dmo.org/erddap/info/bcodmo_dataset_783911/index.html -> it is case sensitive!

We want the following variables: Station, time, Temperature (between 0 and 2), latitude, longitude

6. URL outcome

https://erddap.bco-dmo.org/erddap/tabledap/bcodmo_dataset_783911.htmlTable?Station,time,Temperature,latitude,longitude&Temperature>=0&Temperature<=2

Note: you can also get the URL using the ERDDAP web interface by clicking the “Just generate the URL” button: https://erddap.bco-dmo.org/erddap/tabledap/bcodmo_dataset_783911.htmlTable?Station%2Ctime%2CTemperature%2Clatitude%2Clongitude&Temperature%3E=0&Temperature%3C=2

Access

Web browser

What happens when you copy and past that URL into a web browser?

Command line

You can use curl on command line/Terminal to make requests and get responses.

$ curl https://erddap.bco-dmo.org/erddap/tabledap/bcodmo_dataset_783911.htmlTable?Station,time,Temperature,latitude,longitude&Temperature>=0&Temperature<=2

Exercise:

Add the Fluorescence variable to your URL request

Constrain it to value between 0 and 40

Download the dataset with Python

In the previous chapter we saw that we could download a dataset from erddap pushing a button.

Live coding demo

Link to static Jupyter Notebook. Copy/Paste the code blocks into your own Jupyter Notebook

Opening a Jupyter Notebook on your own computer:

Open anaconda prompt or terminal
Activate erddap environment conda activate erddap
Go to the location where you want your notebook to be: dir or ls and cd commands are useful
jupyter notebook in command line
go to new -> Python 3

We just build the following URL in an exercise:

https://erddap.bco-dmo.org/erddap/tabledap/bcodmo_dataset_783911.htmlTable?Station,time,Temperature,latitude,longitude&Temperature>=0&Temperature<=2

Let’s now download it with python to our computer locally.

The urllib.request library helps with opening URLs. It is part of the erddapy package that you have installed. Import this package into the Python environment to work with url’s in your environment

# Import the urllib library
import urllib.request

Some characters cannot be part of a URL, like ‘,’and ‘>’. When pasting a URL with comma’s into your webbrowser it automatically translates those carachers into a different encoding before transmission.

#define the url you want to download
download_url = "https://erddap.bco-dmo.org/erddap/tabledap/bcodmo_dataset_783911.csv?Station,time,Temperature,latitude,longitude&Temperature%3E=0&Temperature%3C=2"
    

# Define where you want to save the file on your computer
name_to_save = "bcodmo_dataset_815732.csv"

# download the dataset   
urllib.request.urlretrieve(download_url, name_to_save)

Import the Pandas library to work with tables in your Python environment.

# Import the downloaded .csv data into jupyter notebooks with the package Pandas
import pandas as pd
pd.read_csv ("bcodmo_dataset_815732.csv")
dataframe = pd.read_csv ("bcodmo_dataset_815732.csv", dtype='unicode')
print (dataframe)

Key Points

Tabledap request URLs are in the form: server/protocol/datasetID.fileType{?query}

urllib library works with https protocols

Online data to your Python environment

Overview

Teaching: 25 min
Exercises: 20 min

Questions

How do I import an ERDDAP dataset into Python?

How do I interact with the dataset in Python

Objectives

Importing data from an ERDDAP server into your Python environment

Interact with data

ERDDAPY library

In the previous lesson, we downloaded our dataset file to our local machine. Now we will not download it to your local machine, but use in in your python environment directly.

Erddapy is a package that takes advantage of ERDDAP’s RESTful web services and creates the ERDDAP URL for any request, like searching for datasets, acquiring metadata, downloading the data, etc.You can create virtually any request like, searching for datasets, acquiring metadata, downloading data, etc.

Link to static Jupyter Notebook. Copy/Paste the code blocks into your own Jupyter Notebook

Import BCO-DMO temperature dataset - Oregon Coast

Part 1: Create the URL

From the dataset above, we are going to import the variables longitude, latitude, time and Temperature. The time constraints will be netween January 13th and January 16th.

Step 1: Initiate the ERDDAP URL constructor for a server ( erddapy server object).

#Import erddap package into 
from erddapy import ERDDAP

e = ERDDAP(
    server= "https://erddap.bco-dmo.org/erddap/",
    protocol="tabledap",
    response="csv",
)

Step 2: Populate the object with a dataset id, variables of interest, and its constraints. We can download the csvp response with the .to_pandas method.

e.dataset_id = "bcodmo_dataset_817952"
e.variables = [
    "longitude",
    "latitude",
    "time",
    "Temperature"
]
e.constraints = {
    "time>=": "2017-01-13T00:00:00Z",
    "time<=": "2017-01-16T23:59:59Z",}

Check the URL

# Print the URL - check
url = e.get_download_url()
print(url)

Part 2: Import your dataset into pandas

We can import the csv response using the erddapy the .to_pandas method.

# Convert URL to pandas dataframe
df_bcodmo = e.to_pandas(  
    parse_dates=True,
).dropna()

Check out your dataset in pandas

# print the dataframe to check what data is in there specifically. 
df_bcodmo.head()

# print the column names
print (df_bcodmo.columns)

There is a weird name in the title, rename the column to correct this

df_bcodmo.rename(columns={df_bcodmo.columns.values[3]: 'Temperature (degrees Celsius)'}, inplace=True)
print (df_bcodmo.columns)

Subset the tabular data further in pandas based on the time Step 1: convert the time to a datetime object to take out the time

import pandas as pd
# convert to datetime object to be able to work with it in pandas
print (df_bcodmo.dtypes)

df_bcodmo["time (UTC)"] = pd.to_datetime (df_bcodmo["time (UTC)"], format = "%Y-%m-%dT%H:%M:%S")
print (df_bcodmo.dtypes)

Only select the rows for January 13th

df_bcodmo_13 =  df_bcodmo[df_bcodmo["time (UTC)"].dt.day == 13]
df_bcodmo_13

When you inspect the dataset, you can see that some hours have multiple data points, while others have only 1 data point. Let’s average the dataset over every hour using the groupby function

df_bcodmo_13_average = df_bcodmo_13.groupby(df_bcodmo["time (UTC)"].dt.hour)[['Temperature (degrees Celsius)','longitude (degrees_east)','latitude (degrees_north)']].mean().reset_index()
df_bcodmo_13_average

Plot your averaged dataset in pandas

df_bcodmo_13_average.plot (
    x='longitude (degrees_east)',
    y='latitude (degrees_north)', 
    kind = 'scatter',
    c='Temperature (degrees Celsius)',
    colormap="YlOrRd")

Exercise:

Create the URL for this dataset with the variable POC instead of temperature

Answer

#Import erddap package into  from erddapy import ERDDAP

e = ERDDAP(    
server= "https://erddap.bco-dmo.org/erddap/",    
protocol="tabledap",   
response="csv", )

e.dataset_id = "bcodmo_dataset_817952" 
e.variables = [    
"longitude",    
"latitude",    
"time",    
"POC" ] 
e.constraints = {    
"time>=": "2017-01-13T00:00:00Z",    
"time<=": "2017-01-16T23:59:59Z",}

#Print the URL - check 
url = e.get_download_url() 
print(url)

Searching datasets using erddapy

Step 1: Initiate the ERDDAP URL constructor for a server ( erddapy server object).

#searching datasets based on words
from erddapy import ERDDAP
e = ERDDAP(
    server="https://erddap.bco-dmo.org/erddap", 
    protocol="tabledap", 
    response="csv")

Search with keywords:

import pandas as pd
url = e.get_search_url(search_for="Temperature OC1603B", response="csv")

print (url)
pd.read_csv(url)["Dataset ID"]

Inspect the metadata of dataset with id bcodmo_dataset_817952:

#find the variables
info_url = e.get_info_url(dataset_id="bcodmo_dataset_817952")
pd.read_csv(info_url)

pd.set_option('display.max_rows', None) #make sure that jupyter notebook shows all rows
dataframe = pd.read_csv(info_url)
print (dataframe)

#get the unique variable names with pandas
dataframe["Variable Name"].unique()

Exercise: Inspect this BCO-DMO dataset

What are the units of POC?

Who is the Principal Investigator on this dataset?

What is the start and end time of this dataset?

Exercise

What are the unique variables for “bcodmo_dataset_807119?”

Answer

# find the variables
info_url = e.get_info_url(dataset_id="bcodmo_dataset_807119") 
pd.read_csv(info_url)

pd.set_option('display.max_rows', None) #make sure that jupyter notebook shows all rows 
dataframe = pd.read_csv(info_url) dataframe

# get the unique variable names with pandas
dataframe["Variable Name"].unique()

RERRDAP: package for R users to work directly with erddap servers

Information an using erddap: https://docs.ropensci.org/rerddap/articles/Using_rerddap.html

Example from the following page: OOI Glider Data (accessed October 11, 2021):

The mission of the IOOS Glider DAC is to provide glider operators with a simple process for submitting glider data sets to a centralized location, enabling the data to be visualized, analyzed, widely distributed via existing web services and the Global Telecommunications System (GTS) and archived at the National Centers for Environmental Information (NCEI). The IOOS Glider Dac is accessible through rerddap (http://data.ioos.us/gliders/erddap/). Extracting and plotting salinity from part of the path of one glider deployed by the Scripps Institution of Oceanography:

urlBase <- "https://data.ioos.us/gliders/erddap/"
gliderInfo <- info("sp064-20161214T1913",  url = urlBase)
glider <- tabledap(gliderInfo, fields = c("longitude", "latitude", "depth", "salinity"), 'time>=2016-12-14', 'time<=2016-12-23', url = urlBase)
glider$longitude <- as.numeric(glider$longitude)
glider$latitude <- as.numeric(glider$latitude)
glider$depth <- as.numeric(glider$depth)

require("plot3D")
scatter3D(x = glider$longitude , y = glider$latitude , z = -glider$depth, colvar = glider$salinity, col = colors$salinity, phi = 40, theta = 25, bty = "g", type = "p",
           ticktype = "detailed", pch = 10, clim = c(33.2,34.31), clab = 'Salinity',
           xlab = "longitude", ylab = "latitude", zlab = "depth",
           cex = c(0.5, 1, 1.5))

outcome_R_example

Key Points

There are keypackages necessary to import data from ERDDAP into Python: pandas

Data can be downloaded locally or be interacted with directly using erddapy

You can asses your data package in Python

Aggregating multiple datasets

Overview

Teaching: 30 min
Exercises: 0 min

Questions

How can I work with different datasets?

Objectives

Importing data from different ERDDAP servers into your Python environment

Compare different datasets

Map spatial datasets together

Aggregating datasets

Exercise: We want to plot surface temperature of different datasets with each other.

Note; We will be importing datasets from different servers, bear in mind that the granularity (time interval, sensor types) are different and needs to be taken into account when actually comparing these dataset.

Location: Location: Oregon Coast.
Date range: [2017-01-13 till 2017-01-16], UTC time.
Variable: Surface temperature

Link to static Jupyter Notebook. Copy/Paste the code blocks into your own Jupyter Notebook

OOI temperature dataset - Oregon Coast 2017

Import an OOI dataset. Background on the dataset we are importing here and here :

Coastal endurance Array
Platform: Oregon Inshore Surface Monitoring
Instrument CTD.

#Import erddap package
from erddapy import ERDDAP

# ooi constructor:

e = ERDDAP(
    server= " https://erddap.dataexplorer.oceanobservatories.org/erddap/",
    protocol="tabledap",
    response="csv",
)

e.dataset_id = "ooi-ce01issm-rid16-03-ctdbpc000"
e.variables = [
    "longitude",
    "latitude",
    "time",
    "sea_water_temperature"
]
e.constraints = {
    "time>=": "2017-01-13T00:00:00Z",
    "time<=": "2017-01-16T23:59:59Z",}

Check the URL

# Print the URL - check
url = e.get_download_url()
print(url)

Import the dataset into the pandas dataframe and check the lay-out

# Convert URL to pandas dataframe
df_ooi_2017 = e.to_pandas( 
    parse_dates=True,
).dropna()

df_ooi_2017.head()
df_ooi_2017

Plot the data

df_ooi_2017.plot (x='longitude (degrees_east)', 
                    y='latitude (degrees_north)', 
                    kind = 'scatter',
                    c='sea_water_temperature (degree_Celsius)', 
                    colormap="YlOrRd")

OOI temperature dataset 2018 - Oregon Coast

#Import erddap package into 
from erddapy import ERDDAP
# ooi constructor:

e = ERDDAP(
    server= " https://erddap.dataexplorer.oceanobservatories.org/erddap/",
    protocol="tabledap",
    response="csv",
)

e.dataset_id = "ooi-ce01issm-rid16-03-ctdbpc000"
e.variables = [
    "longitude",
    "latitude",
    "time",
    "sea_water_temperature"
]
e.constraints = {
    "time>=": "2018-01-13T00:00:00Z",
    "time<=": "2018-01-13T23:59:59Z",}
print ('done')

# Print the URL - check
url = e.get_download_url()
print(url)

# Convert URL to pandas dataframe
df_ooi_2018 = e.to_pandas( 
    parse_dates=True,
).dropna()

df_ooi_2018.head()
print (df_ooi_2018)

Combining the datasets

averaging the datasets

Convert the objects to datetime objects

import pandas as pd
print (df_ooi_2017.dtypes)
df_ooi_2017["time (UTC)"] = pd.to_datetime (df_ooi_2017["time (UTC)"], format = "%Y-%m-%dT%H:%M:%S")

print (df_ooi_2017.dtypes)

import pandas as pd
print (df_ooi_2018.dtypes)
df_ooi_2018["time (UTC)"] = pd.to_datetime (df_ooi_2018["time (UTC)"], format = "%Y-%m-%dT%H:%M:%S")

print (df_ooi_2017.dtypes)

df_ooi_2017_average = df_ooi_2017.groupby(df_ooi_2017["time (UTC)"].dt.hour)['sea_water_temperature (degree_Celsius)'].mean().reset_index()
print (df_ooi_2017_average)

df_ooi_2018_average = df_ooi_2018.groupby(df_ooi_2018["time (UTC)"].dt.hour)['sea_water_temperature (degree_Celsius)'].mean().reset_index()
print (df_ooi_2018_average)

%matplotlib inline

import matplotlib.pyplot as plt

plt.figure(figsize=(12,5)) 
plt.plot(df_ooi_2017_average["time (UTC)"],df_ooi_2017_average["sea_water_temperature (degree_Celsius)"],label='2017',c='red',marker='.',linestyle='-') 
plt.plot(df_ooi_2018_average["time (UTC)"],df_ooi_2018_average["sea_water_temperature (degree_Celsius)"],label='2018',c='blue',marker='.',linestyle='-') 
plt.ylabel('degrees celsius')
plt.title("January 13th")
plt.legend()
plt.yticks(rotation=90)


#fig, (ax1, ax2) = plt.subplots(2)
#fig.suptitle('Vertically stacked subplots')
#ax1.plot(df_bcodmo["time (UTC)"],df_bcodmo['temperature'],label='bcodmo',c='red',marker='.',linestyle='-')
#ax2.plot(df_ooi["time (UTC)"],df_ooi["sea_water_temperature (degree_Celsius)"],label='OOI',c='blue',marker='.',linestyle='-')
#ax1.set(ylabel='degrees celsius')
#ax2.set(ylabel='degrees celsius')

Key Points

There are keypackages necessary to import data from ERDDAP into Python: pandas, urllib

Data can be downloaded locally or be interacted with directly using erddapy

You can asses your data package in Python

Gridded dataset

Overview

Teaching: 20 min
Exercises: 0 min

Questions

How can I work with gridded datasets

Objectives

Downloading satellite data from ERDDAP in netCDF format

Extracting data with Python

Map spatial datasets

Working with gridded data

Gridded data works with a different protocol called griddap. Instead of using the erddapy library. It is easier to import the data in .netcdf using the package xarray and netcdf.

Setting the constraints with this packageg is a bit more straightforward

griddap request URLs must be in the form https://coastwatch.pfeg.noaa.gov/erddap/griddap/datasetID.fileType{?query}

For example, https://coastwatch.pfeg.noaa.gov/erddap/griddap/jplMURSST41.htmlTable?analysed_sst[(2002-06-01T09:00:00Z)][(-89.99):1000:(89.99)][(-179.99):1000:(180.0)]

The query is often a data variable name (e.g., analysed_sst), followed by [(start):stride:(stop)] (or a shorter variation of that) for each of the variable’s dimensions (for example, [time][latitude][longitude]).

Link to static Jupyter Notebook. Copy/Paste the code blocks into your own Jupyter Notebook

xarray works similarly as the pandas data package.

import xarray as xr
import netCDF4 as nc

Satellite data NASA sea surface temperature : GHRSST Global 1-km Sea Surface Temperature (G1SST), Global, 0.01 Degree, 2010-2017, Daily The data contains daily composites of SST with 1 km resolution

Importing the downloaded data in Python. Now that we’ve downloaded the data locally, we can import it and extract our variables of interest:

import xarray as xr

server = 'https://coastwatch.pfeg.noaa.gov/erddap'
protocol = 'griddap'
dataset_id = "jplG1SST"
full_URL = '/'.join([server,protocol,dataset_id])
print(full_URL)
da = xr.open_dataset(full_URL)

Inspect the dataframe:

print (da)

Getting the dataset without subsetting it creates an error -> the data set is too large to be downloaded completely.

sst = da['SST']
sst

Create subsets of your netcdf file:

For this exercise, the area we are interested in includes Monterey Bay, CA:

Latitude range: 44.0N, 48.0N
Longitude range: -128E, -121E
Time range 2017-01-13T00:00:00Z to 2017-01-16T23:59:59Z

Xarray supports:

label-based indexing using .sel
position-based indexing using .isel

slice() function can take three parameters:

start (optional) - Starting integer where the slicing of the object starts. Default to None if not provided.
stop - Integer until which the slicing takes place. The slicing stops at index stop -1 (last element)

import xarray as xr

server = 'https://coastwatch.pfeg.noaa.gov/erddap'
protocol = 'griddap'
dataset_id = "jplG1SST"
full_URL = '/'.join([server,protocol,dataset_id])
print(full_URL)
da = xr.open_dataset(full_URL)


sst = da['SST'].sel(  
                  latitude=slice(44., 48.),  
                  longitude=slice(-128, -121), 
                  time='2017-01-13T00:00:00'
                 )
sst

%matplotlib inline
sst.isel(time=0).plot.imshow()

Key Points

There are key packages necessary to import data from ERDDAP into Python: xarray

xarray works similar to Pandas

xarray has a build in plotter for gridded datasets

Reusing Open Data with ERDDAP and Python

Open Data & ERDDAP

Overview

Open Data

Aligning data sources

Specific ERDDAP servers

Poll

Key Points

Finding data in the ERDDAP data catalog

Overview

Exploring an ERDDAP data catalog

Finding data

Dataset information

Subsetting data

Set the file type

Download the data “Submit”

Create a graph

Exercise: Inspect BGC-Argo data

Key Points

Data requests using an ERDDAP URL

Overview

The ERDDAP REST API Service

What is an API?

Concept: Restaurant as an API

ERDDAP request URLs

Building the URL of a dataset

1. ERDDAP server you want to get data

2. Protocol

3. Dataset ID

4. Choose your file type

5. Data request

6. URL outcome

Access

Web browser

Command line

Exercise:

Download the dataset with Python

Key Points

Online data to your Python environment

Overview

ERDDAPY library

Import BCO-DMO temperature dataset - Oregon Coast

Part 1: Create the URL

Part 2: Import your dataset into pandas

Exercise:

Answer

Searching datasets using erddapy

Exercise: Inspect this BCO-DMO dataset

Exercise

Answer

RERRDAP: package for R users to work directly with erddap servers

Key Points

Aggregating multiple datasets

Overview

Aggregating datasets

OOI temperature dataset - Oregon Coast 2017

OOI temperature dataset 2018 - Oregon Coast

Combining the datasets

Key Points

Gridded dataset

Overview

Working with gridded data

Key Points