argopy is the official library to access Argo program data from Python. It abstracts the GDAC plumbing (FTP, raw NetCDF) and returns a tidy xarray.Dataset.
Three fetch methods:
| Method | Use case |
|---|---|
fetch_float(WMO) | A specific float, full history. |
fetch_box([...]) | Any profile inside a space-time box. |
fetch_region([...]) | Same as fetch_box but tuned for larger regions with parallel download. |
Three data modes: R (real-time), A (real-time adjusted), and D (delayed-mode, calibrated by a PI). More on this in chapter 02.
%run _style.py
from argopy import DataFetcher
import argopy
print('argopy', argopy.__version__)argopy 1.4.0
Fetching a float by WMO¶
Every Argo float has a unique identifier: the WMO number. You can find it in the official catalogue or on the OceanOPS map.
We’ll work with float 5905141, deployed in the South Atlantic (between 32 and 36 S, ~25 to 47 W), with over 300 profiles. Good coverage to show the Brazil-Malvinas confluence.
ds = DataFetcher(src='erddap', mode='standard').float(5905141).to_xarray()
dsWhat you get is an xarray.Dataset with all profiles concatenated along an N_POINTS dimension (the argopy default). Next chapter we reshape it into the classic N_PROF × N_LEVELS layout.
Fetching a space-time box¶
For regional analyses it’s convenient to ask for every profile that fell inside a bounding box. The signature is:
[lon_min, lon_max, lat_min, lat_max, pres_min, pres_max, date_min, date_max]Example: open South Atlantic, first quarter of 2020, down to 2000 dbar:
box = [-45, -25, -40, -30, 0, 2000, '2020-01-01', '2020-03-31']
ds_box = DataFetcher(src='erddap', mode='standard').region(box).to_xarray()
print(f'{ds_box.dims["N_POINTS"]:,} points downloaded')
ds_box127,274 points downloaded
/var/folders/8j/y_l8frxs2n19mq92k5pv4y100000gn/T/ipykernel_27037/948486972.py:3: FutureWarning: The return type of `Dataset.dims` will be changed to return a set of dimension names in future, in order to be more consistent with `DataArray.dims`. To access a mapping from dimension names to lengths, please use `Dataset.sizes`.
print(f'{ds_box.dims["N_POINTS"]:,} points downloaded')
expert vs standard mode¶
DataFetcher has two modes:
standard(default): returnsTEMP,PSAL,PRESalready adjusted when adjustments are available. Good for direct scientific analysis.expert: returns ALL variables including_ADJUSTED,_QC,_ADJUSTED_QC,_ERROR. Useful for QC inspection and advanced workflows.
For research we recommend expert mode (we discuss this in chapter 02).
ds_expert = DataFetcher(src='erddap', mode='expert').float(5905141).to_xarray()
list(ds_expert.data_vars)[np.str_('CONFIG_MISSION_NUMBER'),
np.str_('CYCLE_NUMBER'),
np.str_('DATA_MODE'),
np.str_('DIRECTION'),
np.str_('PLATFORM_NUMBER'),
np.str_('POSITION_QC'),
np.str_('PRES'),
np.str_('PRES_ADJUSTED'),
np.str_('PRES_ADJUSTED_ERROR'),
np.str_('PRES_ADJUSTED_QC'),
np.str_('PRES_QC'),
np.str_('PSAL'),
np.str_('PSAL_ADJUSTED'),
np.str_('PSAL_ADJUSTED_ERROR'),
np.str_('PSAL_ADJUSTED_QC'),
np.str_('PSAL_QC'),
np.str_('TEMP'),
np.str_('TEMP_ADJUSTED'),
np.str_('TEMP_ADJUSTED_ERROR'),
np.str_('TEMP_ADJUSTED_QC'),
np.str_('TEMP_QC'),
np.str_('TIME_QC'),
np.str_('VERTICAL_SAMPLING_SCHEME')]Local cache¶
By default argopy caches downloads in ~/.argopy_cache. If you’re iterating on the same dataset, enable it:
from argopy import set_options
set_options(cachedir='./.argopy_cache')
ds_cached = DataFetcher(src='erddap', cache=True).float(5905141).to_xarray()
dict(ds_cached.dims)/var/folders/8j/y_l8frxs2n19mq92k5pv4y100000gn/T/ipykernel_27037/19577681.py:4: FutureWarning: The return type of `Dataset.dims` will be changed to return a set of dimension names in future, in order to be more consistent with `DataArray.dims`. To access a mapping from dimension names to lengths, please use `Dataset.sizes`.
dict(ds_cached.dims)
{'N_POINTS': 174628}Summary¶
DataFetcher(src='erddap')is the entry point..float(WMO),.region(box),.profile(WMO, CYC)are the three main selection methods..to_xarray()materializes the download.mode='expert'gives access to adjusted and QC variables.Enable the cache if iterating.
Next chapter: how the Dataset is laid out internally.