xarray.DataArray.groupby#

DataArray.groupby(group=None, *, squeeze=False, restore_coord_dims=False, eagerly_compute_group=None, **groupers)[source]#

Returns a DataArrayGroupBy object for performing grouped operations.

Parameters:

group (str or DataArray or IndexVariable or sequence of hashable or mapping of hashable to Grouper) – Array whose unique values should be used to group this array. If a Hashable, must be the name of a coordinate contained in this dataarray. If a dictionary, must map an existing variable name to a Grouper instance.
squeeze (False) – This argument is deprecated.
restore_coord_dims (bool, default: False) – If True, also restore the dimension order of multi-dimensional coordinates.
eagerly_compute_group (bool, optional) – This argument is deprecated.
**groupers (Mapping of str to Grouper or Resampler) – Mapping of variable name to group by to Grouper or Resampler object. One of group or groupers must be provided. Only a single grouper is allowed at present.

Returns:

grouped (DataArrayGroupBy) – A DataArrayGroupBy object patterned after pandas.GroupBy that can be iterated over in the form of (unique_value, grouped_array) pairs.

Examples

Calculate daily anomalies for daily data:

>>> da = xr.DataArray(
...     np.linspace(0, 1826, num=1827),
...     coords=[pd.date_range("2000-01-01", "2004-12-31", freq="D")],
...     dims="time",
... )
>>> da
<xarray.DataArray (time: 1827)> Size: 15kB
array([0.000e+00, 1.000e+00, 2.000e+00, ..., 1.824e+03, 1.825e+03,
       1.826e+03], shape=(1827,))
Coordinates:
  * time     (time) datetime64[ns] 15kB 2000-01-01 2000-01-02 ... 2004-12-31
>>> da.groupby("time.dayofyear") - da.groupby("time.dayofyear").mean("time")
<xarray.DataArray (time: 1827)> Size: 15kB
array([-730.8, -730.8, -730.8, ...,  730.2,  730.2,  730.5], shape=(1827,))
Coordinates:
  * time       (time) datetime64[ns] 15kB 2000-01-01 2000-01-02 ... 2004-12-31
    dayofyear  (time) int64 15kB 1 2 3 4 5 6 7 8 ... 360 361 362 363 364 365 366

Use a Grouper object to be more explicit

>>> da.coords["dayofyear"] = da.time.dt.dayofyear
>>> da.groupby(dayofyear=xr.groupers.UniqueGrouper()).mean()
<xarray.DataArray (dayofyear: 366)> Size: 3kB
array([ 730.8,  731.8,  732.8, ..., 1093.8, 1094.8, 1095.5])
Coordinates:
  * dayofyear  (dayofyear) int64 3kB 1 2 3 4 5 6 7 ... 361 362 363 364 365 366

>>> da = xr.DataArray(
...     data=np.arange(12).reshape((4, 3)),
...     dims=("x", "y"),
...     coords={"x": [10, 20, 30, 40], "letters": ("x", list("abba"))},
... )

Grouping by a single variable is easy

>>> da.groupby("letters")
<DataArrayGroupBy, grouped over 1 grouper(s), 2 groups in total:
    'letters': UniqueGrouper('letters'), 2/2 groups with labels 'a', 'b'>

Execute a reduction

>>> da.groupby("letters").sum()
<xarray.DataArray (letters: 2, y: 3)> Size: 48B
array([[ 9, 11, 13],
       [ 9, 11, 13]])
Coordinates:
  * letters  (letters) object 16B 'a' 'b'
Dimensions without coordinates: y

Grouping by multiple variables

>>> da.groupby(["letters", "x"])
<DataArrayGroupBy, grouped over 2 grouper(s), 8 groups in total:
    'letters': UniqueGrouper('letters'), 2/2 groups with labels 'a', 'b'
    'x': UniqueGrouper('x'), 4/4 groups with labels 10, 20, 30, 40>

Use Grouper objects to express more complicated GroupBy operations

>>> from xarray.groupers import BinGrouper, UniqueGrouper
>>>
>>> da.groupby(x=BinGrouper(bins=[5, 15, 25]), letters=UniqueGrouper()).sum()
<xarray.DataArray (x_bins: 2, letters: 2, y: 3)> Size: 96B
array([[[ 0.,  1.,  2.],
        [nan, nan, nan]],

       [[nan, nan, nan],
        [ 3.,  4.,  5.]]])
Coordinates:
  * x_bins   (x_bins) interval[int64, right] 32B (5, 15] (15, 25]
  * letters  (letters) object 16B 'a' 'b'
Dimensions without coordinates: y