What argopy returns by default is in “point-cloud” format: a single N_POINTS dimension with all points (each profile = several consecutive points). Efficient but not the classical Argo layout.
For science, reshape it to N_PROF × N_LEVELS:
N_PROF: number of profiles (= cycles).N_LEVELS: pressure levels within each profile.
argopy does this with .argo.point2profile().
%run _style.py
from argopy import DataFetcher
import argopy
ds_point = DataFetcher(src='erddap', mode='expert').float(5905141).to_xarray()
print('point-cloud:', dict(ds_point.dims))point-cloud: {'N_POINTS': 176145}
/var/folders/8j/y_l8frxs2n19mq92k5pv4y100000gn/T/ipykernel_27248/3254288223.py:6: FutureWarning: The return type of `Dataset.dims` will be changed to return a set of dimension names in future, in order to be more consistent with `DataArray.dims`. To access a mapping from dimension names to lengths, please use `Dataset.sizes`.
print('point-cloud:', dict(ds_point.dims))
ds = ds_point.argo.point2profile()
print('profile format:', dict(ds.dims))
dsprofile format: {'N_PROF': 314, 'N_LEVELS': 562}
/var/folders/8j/y_l8frxs2n19mq92k5pv4y100000gn/T/ipykernel_27248/2047636976.py:2: FutureWarning: The return type of `Dataset.dims` will be changed to return a set of dimension names in future, in order to be more consistent with `DataArray.dims`. To access a mapping from dimension names to lengths, please use `Dataset.sizes`.
print('profile format:', dict(ds.dims))
Key variables¶
Beyond coordinates (LATITUDE, LONGITUDE, TIME, PRES), the dataset carries:
| Family | Meaning |
|---|---|
TEMP, PSAL, PRES | Raw variables (real-time). |
TEMP_ADJUSTED, PSAL_ADJUSTED, PRES_ADJUSTED | Adjusted variables (delayed-mode calibration when available). |
*_QC, *_ADJUSTED_QC | Quality flags (string, values ‘1’ to ‘9’). |
*_ERROR | Estimated error of the adjusted value (delayed-mode only). |
DATA_MODE | Per profile: ‘R’ (real-time), ‘A’ (real-time + adjustments), ‘D’ (delayed-mode). |
Data modes: R / A / D¶
R (real-time): what the float transmits directly, with automatic QC. Available within hours.
A (adjusted): same as R with a preliminary automatic adjustment in
*_ADJUSTED.D (delayed-mode): a PI reviewed it, calibrated the sensor against nearby CTDs, applied fine adjustments. Available 6 to 12 months later.
For rigorous science always use *_ADJUSTED when DATA_MODE is ‘D’ or ‘A’, the raw variable only when it’s ‘R’.
import numpy as np
modes = np.array([str(m) for m in ds.DATA_MODE.values])
unique, counts = np.unique(modes, return_counts=True)
for m, c in zip(unique, counts):
print(f' {m}: {c} profiles') D: 314 profiles
QC flags¶
Each value (TEMP, PSAL, PRES) has an associated QC flag. Canonical table:
| Flag | Meaning |
|---|---|
'1' | Good. Data is good. |
'2' | Probably good. |
'3' | Probably bad. Use with caution. |
'4' | Bad. Discard. |
'5' | Changed (a value was corrected). |
'8' | Estimated (interpolated). |
'9' | Missing value. |
Standard practice: keep flags ‘1’ and ‘2’ (good + probably good).
We use the adjusted variable and mask out anything that isn’t good:
import xarray as xr
# usar ADJUSTED si DATA_MODE != 'R', sino la cruda
def merge_adjusted(ds, var):
"""Devuelve var con valores ADJUSTED donde DATA_MODE != R."""
mode_is_R = (ds.DATA_MODE.astype(str) == 'R')
return xr.where(mode_is_R, ds[var], ds[f'{var}_ADJUSTED'])
temp = merge_adjusted(ds, 'TEMP')
psal = merge_adjusted(ds, 'PSAL')
pres = merge_adjusted(ds, 'PRES')
def mask_qc(var, qc, good=('1', '2')):
qc_str = qc.astype(str)
mask = xr.zeros_like(qc_str, dtype=bool)
for g in good:
mask = mask | (qc_str == g)
return var.where(mask)
mode_is_R = (ds.DATA_MODE.astype(str) == 'R')
temp_qc = xr.where(mode_is_R, ds.TEMP_QC, ds.TEMP_ADJUSTED_QC)
psal_qc = xr.where(mode_is_R, ds.PSAL_QC, ds.PSAL_ADJUSTED_QC)
pres_qc = xr.where(mode_is_R, ds.PRES_QC, ds.PRES_ADJUSTED_QC)
temp_clean = mask_qc(temp, temp_qc)
psal_clean = mask_qc(psal, psal_qc)
pres_clean = mask_qc(pres, pres_qc)
print('temp shape:', temp_clean.shape)
print('% valid temperature:', f'{float(temp_clean.notnull().mean())*100:.1f}%')
print('% valid salinity: ', f'{float(psal_clean.notnull().mean())*100:.1f}%')temp shape: (314, 562)
% valid temperature: 99.6%
% valid salinity: 99.6%
This pair (merge_adjusted + mask_qc) is the pattern you’ll use all the time. Worth keeping handy.
To keep the code below clean, we wrap it all into a single “clean” Dataset:
ds_clean = xr.Dataset({
'TEMP': temp_clean,
'PSAL': psal_clean,
'PRES': pres_clean,
}, coords={
'LATITUDE': ds.LATITUDE,
'LONGITUDE': ds.LONGITUDE,
'TIME': ds.TIME,
'CYCLE_NUMBER': ds.CYCLE_NUMBER,
})
ds_cleanSummary¶
argopyreturnsN_POINTSformat by default. Use.argo.point2profile()to getN_PROF × N_LEVELS.Decide per profile whether to use the raw variable or
_ADJUSTEDbased onDATA_MODE.Mask by QC: keep flags
'1'and'2'.Wrap this in helper functions (
merge_adjusted,mask_qc) and save the clean dataset once.
Next: T/S profiles and T-S diagrams.