Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

01. Accessing the data with argopy

Suyana

argopy is the official library to access Argo program data from Python. It abstracts the GDAC plumbing (FTP, raw NetCDF) and returns a tidy xarray.Dataset.

Three fetch methods:

MethodUse case
fetch_float(WMO)A specific float, full history.
fetch_box([...])Any profile inside a space-time box.
fetch_region([...])Same as fetch_box but tuned for larger regions with parallel download.

Three data modes: R (real-time), A (real-time adjusted), and D (delayed-mode, calibrated by a PI). More on this in chapter 02.

%run _style.py
from argopy import DataFetcher
import argopy
print('argopy', argopy.__version__)
argopy 1.4.0

Fetching a float by WMO

Every Argo float has a unique identifier: the WMO number. You can find it in the official catalogue or on the OceanOPS map.

We’ll work with float 5905141, deployed in the South Atlantic (between 32 and 36 S, ~25 to 47 W), with over 300 profiles. Good coverage to show the Brazil-Malvinas confluence.

ds = DataFetcher(src='erddap', mode='standard').float(5905141).to_xarray()
ds
Loading...

What you get is an xarray.Dataset with all profiles concatenated along an N_POINTS dimension (the argopy default). Next chapter we reshape it into the classic N_PROF × N_LEVELS layout.

Fetching a space-time box

For regional analyses it’s convenient to ask for every profile that fell inside a bounding box. The signature is:

[lon_min, lon_max, lat_min, lat_max, pres_min, pres_max, date_min, date_max]

Example: open South Atlantic, first quarter of 2020, down to 2000 dbar:

box = [-45, -25, -40, -30, 0, 2000, '2020-01-01', '2020-03-31']
ds_box = DataFetcher(src='erddap', mode='standard').region(box).to_xarray()
print(f'{ds_box.dims["N_POINTS"]:,} points downloaded')
ds_box
127,274 points downloaded
/var/folders/8j/y_l8frxs2n19mq92k5pv4y100000gn/T/ipykernel_27037/948486972.py:3: FutureWarning: The return type of `Dataset.dims` will be changed to return a set of dimension names in future, in order to be more consistent with `DataArray.dims`. To access a mapping from dimension names to lengths, please use `Dataset.sizes`.
  print(f'{ds_box.dims["N_POINTS"]:,} points downloaded')
Loading...

expert vs standard mode

DataFetcher has two modes:

  • standard (default): returns TEMP, PSAL, PRES already adjusted when adjustments are available. Good for direct scientific analysis.

  • expert: returns ALL variables including _ADJUSTED, _QC, _ADJUSTED_QC, _ERROR. Useful for QC inspection and advanced workflows.

For research we recommend expert mode (we discuss this in chapter 02).

ds_expert = DataFetcher(src='erddap', mode='expert').float(5905141).to_xarray()
list(ds_expert.data_vars)
[np.str_('CONFIG_MISSION_NUMBER'), np.str_('CYCLE_NUMBER'), np.str_('DATA_MODE'), np.str_('DIRECTION'), np.str_('PLATFORM_NUMBER'), np.str_('POSITION_QC'), np.str_('PRES'), np.str_('PRES_ADJUSTED'), np.str_('PRES_ADJUSTED_ERROR'), np.str_('PRES_ADJUSTED_QC'), np.str_('PRES_QC'), np.str_('PSAL'), np.str_('PSAL_ADJUSTED'), np.str_('PSAL_ADJUSTED_ERROR'), np.str_('PSAL_ADJUSTED_QC'), np.str_('PSAL_QC'), np.str_('TEMP'), np.str_('TEMP_ADJUSTED'), np.str_('TEMP_ADJUSTED_ERROR'), np.str_('TEMP_ADJUSTED_QC'), np.str_('TEMP_QC'), np.str_('TIME_QC'), np.str_('VERTICAL_SAMPLING_SCHEME')]

Local cache

By default argopy caches downloads in ~/.argopy_cache. If you’re iterating on the same dataset, enable it:

from argopy import set_options
set_options(cachedir='./.argopy_cache')
ds_cached = DataFetcher(src='erddap', cache=True).float(5905141).to_xarray()
dict(ds_cached.dims)
/var/folders/8j/y_l8frxs2n19mq92k5pv4y100000gn/T/ipykernel_27037/19577681.py:4: FutureWarning: The return type of `Dataset.dims` will be changed to return a set of dimension names in future, in order to be more consistent with `DataArray.dims`. To access a mapping from dimension names to lengths, please use `Dataset.sizes`.
  dict(ds_cached.dims)
{'N_POINTS': 174628}

Summary

  • DataFetcher(src='erddap') is the entry point.

  • .float(WMO), .region(box), .profile(WMO, CYC) are the three main selection methods.

  • .to_xarray() materializes the download.

  • mode='expert' gives access to adjusted and QC variables.

  • Enable the cache if iterating.

Next chapter: how the Dataset is laid out internally.