xarray.DataArray.groupby#
- DataArray.groupby(group=None, *, squeeze=False, restore_coord_dims=False, eagerly_compute_group=None, **groupers)[source]#
Returns a DataArrayGroupBy object for performing grouped operations.
- Parameters:
group (
str
orDataArray
orIndexVariable
or sequence of hashable or mapping of hashable toGrouper
) – Array whose unique values should be used to group this array. If a Hashable, must be the name of a coordinate contained in this dataarray. If a dictionary, must map an existing variable name to aGrouper
instance.squeeze (
False
) – This argument is deprecated.restore_coord_dims (
bool
, default:False
) – If True, also restore the dimension order of multi-dimensional coordinates.eagerly_compute_group (
bool
, optional) – This argument is deprecated.**groupers (
Mapping
ofstr
toGrouper
orResampler
) – Mapping of variable name to group by toGrouper
orResampler
object. One ofgroup
orgroupers
must be provided. Only a singlegrouper
is allowed at present.
- Returns:
grouped (
DataArrayGroupBy
) – A DataArrayGroupBy object patterned after pandas.GroupBy that can be iterated over in the form of (unique_value, grouped_array) pairs.
Examples
Calculate daily anomalies for daily data:
>>> da = xr.DataArray( ... np.linspace(0, 1826, num=1827), ... coords=[pd.date_range("2000-01-01", "2004-12-31", freq="D")], ... dims="time", ... ) >>> da <xarray.DataArray (time: 1827)> Size: 15kB array([0.000e+00, 1.000e+00, 2.000e+00, ..., 1.824e+03, 1.825e+03, 1.826e+03], shape=(1827,)) Coordinates: * time (time) datetime64[ns] 15kB 2000-01-01 2000-01-02 ... 2004-12-31 >>> da.groupby("time.dayofyear") - da.groupby("time.dayofyear").mean("time") <xarray.DataArray (time: 1827)> Size: 15kB array([-730.8, -730.8, -730.8, ..., 730.2, 730.2, 730.5], shape=(1827,)) Coordinates: * time (time) datetime64[ns] 15kB 2000-01-01 2000-01-02 ... 2004-12-31 dayofyear (time) int64 15kB 1 2 3 4 5 6 7 8 ... 360 361 362 363 364 365 366
Use a
Grouper
object to be more explicit>>> da.coords["dayofyear"] = da.time.dt.dayofyear >>> da.groupby(dayofyear=xr.groupers.UniqueGrouper()).mean() <xarray.DataArray (dayofyear: 366)> Size: 3kB array([ 730.8, 731.8, 732.8, ..., 1093.8, 1094.8, 1095.5]) Coordinates: * dayofyear (dayofyear) int64 3kB 1 2 3 4 5 6 7 ... 361 362 363 364 365 366
>>> da = xr.DataArray( ... data=np.arange(12).reshape((4, 3)), ... dims=("x", "y"), ... coords={"x": [10, 20, 30, 40], "letters": ("x", list("abba"))}, ... )
Grouping by a single variable is easy
>>> da.groupby("letters") <DataArrayGroupBy, grouped over 1 grouper(s), 2 groups in total: 'letters': UniqueGrouper('letters'), 2/2 groups with labels 'a', 'b'>
Execute a reduction
>>> da.groupby("letters").sum() <xarray.DataArray (letters: 2, y: 3)> Size: 48B array([[ 9, 11, 13], [ 9, 11, 13]]) Coordinates: * letters (letters) object 16B 'a' 'b' Dimensions without coordinates: y
Grouping by multiple variables
>>> da.groupby(["letters", "x"]) <DataArrayGroupBy, grouped over 2 grouper(s), 8 groups in total: 'letters': UniqueGrouper('letters'), 2/2 groups with labels 'a', 'b' 'x': UniqueGrouper('x'), 4/4 groups with labels 10, 20, 30, 40>
Use Grouper objects to express more complicated GroupBy operations
>>> from xarray.groupers import BinGrouper, UniqueGrouper >>> >>> da.groupby(x=BinGrouper(bins=[5, 15, 25]), letters=UniqueGrouper()).sum() <xarray.DataArray (x_bins: 2, letters: 2, y: 3)> Size: 96B array([[[ 0., 1., 2.], [nan, nan, nan]], [[nan, nan, nan], [ 3., 4., 5.]]]) Coordinates: * x_bins (x_bins) interval[int64, right] 32B (5, 15] (15, 25] * letters (letters) object 16B 'a' 'b' Dimensions without coordinates: y
See also
- GroupBy: Group and Bin Data
Users guide explanation of how to group and bin data.
- Computational Patterns
Tutorial on
Groupby()
for windowed computation- Grouped Computations
Tutorial on
Groupby()
demonstrating reductions, transformation and comparison withresample()
pandas.DataFrame.groupby
DataArray.groupby_bins
Dataset.groupby
core.groupby.DataArrayGroupBy
DataArray.coarsen
Dataset.resample
DataArray.resample