Access Data from the Ocean Color Instrument (OCI)#
Authors: Anna Windle (NASA, SSAI), Ian Carroll (NASA, UMBC), Carina Poulin (NASA, SSAI)
PREREQUISITES
This notebook has the following prerequisites:
An Earthdata Login account is required to access data from the NASA Earthdata system, including NASA ocean color data.
There are no prerequisite notebooks for this module.
Summary#
In this example we will use the earthaccess
package to search for
OCI products on NASA Earthdata. The earthaccess
package, published
on the Python Package Index and conda-forge,
facilitates discovery and use of all NASA Earth Science data
products by providing an abstraction layer for NASA’s Common
Metadata Repository (CMR) API and by simplifying requests to
NASA’s Earthdata Cloud. Searching for data is more
approachable using earthaccess
than low-level HTTP requests, and
the same goes for S3 requests.
In short, earthaccess
helps authenticate with Earthdata Login,
makes search easier, and provides a stream-lined way to load
data into xarray
containers. For more on earthaccess
, visit
the documentation site. Be aware that
earthaccess
is under active development.
To understand the discussions below on downloading and opening data, we need to clearly understand where our notebook is running. There are three cases to distinguish:
The notebook is running on the local host. For instance, you started a Jupyter server on your laptop.
The notebook is running on a remote host, but it does not have direct access to the NASA Earthdata Cloud. For instance, you are running in GitHub Codespaces.
The notebook is running on a remote host that does have direct access to the NASA Earthdata Cloud. At this time, we cannot provide a “for instance” which is available to everyone.
Learning Objectives#
At the end of this notebook you will know:
How to store your NASA Earthdata Login credentials with
earthaccess
How to use
earthaccess
to search for OCI data using search filtersHow to download OCI data, but only when you need to
Contents#
1. Setup#
We begin by importing the only package used in this notebook. If you have created an environment following the guidance provided with this tutorial, then the import will be successful.
import earthaccess
We also need pathlib
for directory creation, at least until earthaccess
version 0.9.1 is available.
import pathlib
2. NASA Earthdata Authentication#
Next, we authenticate using our Earthdata Login
credentials. Authentication is not needed to search publicaly
available collections in Earthdata, but is always needed to access
data. We can use the login
method from the earthaccess
package. This will create an authenticated session when we provide a
valid Earthdata Login username and password. The earthaccess
package will search for credentials defined by environmental
variables or within a .netrc file saved in the home
directory. If credentials are not found, an interactive prompt will
allow you to input credentials.
persist=True
argument ensures any discovered credentials are
stored in a .netrc
file, so the argument is not necessary (but
it's also harmless) for subsequent calls to earthaccess.login
.
auth = earthaccess.login(persist=True)
3. Search for Data#
Collections on NASA Earthdata are discovered with the
search_datasets
function, which accepts an instrument
filter as an
easy way to get started. Each of the items in the list of
collections returned has a “short-name”.
results = earthaccess.search_datasets(instrument="oci")
Datasets found: 19
for item in results:
summary = item.summary()
print(summary["short-name"])
PACE_OCI_L1A_SCI
PACE_OCI_L1B_SCI
PACE_OCI_L1C_SCI
PACE_OCI_L2_AOP_NRT
PACE_OCI_L2_BGC_NRT
PACE_OCI_L2_IOP_NRT
PACE_OCI_L2_PAR_NRT
PACE_OCI_L3B_CHL_NRT
PACE_OCI_L3B_IOP_NRT
PACE_OCI_L3B_KD_NRT
PACE_OCI_L3B_PAR_NRT
PACE_OCI_L3B_POC_NRT
PACE_OCI_L3B_RRS_NRT
PACE_OCI_L3M_CHL_NRT
PACE_OCI_L3M_IOP_NRT
PACE_OCI_L3M_KD_NRT
PACE_OCI_L3M_PAR_NRT
PACE_OCI_L3M_POC_NRT
PACE_OCI_L3M_RRS_NRT
Next, we use the search_data
function to find granules within a
collection. Let’s use the short_name
for the PACE/OCI Level-2
quick-look, or near real time (NRT), product for biogeochemical properties (although you can
search for granules accross collections too).
The count
argument limits the number of granules returned and stored in the results
list, not the number of granules found.
results = earthaccess.search_data(
short_name="PACE_OCI_L2_BGC_NRT",
count=1,
)
Granules found: 8967
We can refine our search by passing more parameters that describe
the spatiotemporal domain of our use case. Here, we use the
temporal
parameter to request a date range and the bounding_box
parameter to request granules that intersect with a bounding box. We
can even provide a cloud_cover
threshold to limit files that have
a lower percetnage of cloud cover. We do not provide a count
, so
we’ll get all granules that satisfy the constraints.
tspan = ("2024-05-01", "2024-05-16")
bbox = (-76.75, 36.97, -75.74, 39.01)
clouds = (0, 50)
results = earthaccess.search_data(
short_name="PACE_OCI_L2_BGC_NRT",
temporal=tspan,
bounding_box=bbox,
cloud_cover=clouds,
)
Granules found: 3
Displaying results shows the direct download link: try it! The link will download one granule to your local machine, which may or may not be what you want to do. Even if you are running the notebook on a remote host, this download link will open a new browser tab or window and offer to save a file to your local machine. If you are running the notebook locally, this may be of use. However, in the next section we’ll see how to download all the results with one command.
results[0]
results[1]
results[2]
4. Download Data#
An upcoming tutorial will need access to Level-1 files, whether or not we have direct access to the Earthdata Cloud, so let’s go ahead and download a couple granules. As always, we start with an earthaccess.search_data
.
results = earthaccess.search_data(
short_name="PACE_OCI_L1B_SCI",
temporal=tspan,
bounding_box=bbox,
count=2,
)
Granules found: 23
Now, we need to first understand the alternative to downloading granules, since you may be surprised
that there is an alternative at all. The earthaccess.open
function accepts the list of results from
earthaccess.search_data
and returns a list of file-like objects. No actual files are transferred.
paths = earthaccess.open(results)
Opening 2 granules, approx size: 3.47 GB
using endpoint: https://obdaac-tea.earthdatacloud.nasa.gov/s3credentials
The file-like objects held in paths
can each be read like a normal
file. Here we load the first few bytes without any specialized
reader.
with paths[0] as file:
line = file.readline().strip()
line
b'\x89HDF'
Of course that doesn’t mean anything (or does it? 😉), because this is a binary file that needs a reader which understands the file format.
The earthaccess.open
function is used when you want to directly read
a bytes from a remote filesystem, but not download a whole file. When
running code on a host with direct access to the NASA Earthdata
Cloud, you don’t need to download the data and earthaccess.open
is the way to go.
Now, let’s look at the earthaccess.download
function, which is used
to copy files onto a filesystem local to the machine executing the
code. For this function, provide the output of
earthaccess.search_data
along with a directory where earthaccess
will store downloaded granules.
Even if you only want to read a slice of the data, and downloading
seems unncessary, if you use earthaccess.open
while not running on
a remote host with direct access to the NASA Earthdata Cloud,
performance will be very poor. This is not a problem with “the
cloud” or with earthaccess
, it has to do with the data format and
may soon be resolved.
Let’s continue to downloading the list of granules!
directory = pathlib.Path("L1B")
directory.mkdir(exist_ok=True)
paths = earthaccess.download(results, directory)
Getting 2 granules, approx download size: 3.47 GB
Accessing cloud dataset using dataset endpoint credentials: https://obdaac-tea.earthdatacloud.nasa.gov/s3credentials
Downloaded: L1B/PACE_OCI.20240501T165311.L1B.nc
Downloaded: L1B/PACE_OCI.20240501T165811.L1B.nc
The paths
list now contains paths to actual files on the local
filesystem.
paths
[PosixPath('L1B/PACE_OCI.20240501T165311.L1B.nc'),
PosixPath('L1B/PACE_OCI.20240501T165811.L1B.nc')]
paths = earthaccess.open(...)is used to read data directly from the NASA Earthdata Cloud, you need to substitute
paths = earthaccess.download(..., local_path)before running the notebook on a local host or a remote host that does not have direct access to the NASA Earthdata Cloud.
You have completed the notebook on downloading and opening datasets. We now suggest starting the notebook on File Structure at Three Processing Levels.