xarray.DataArray.groupby#

DataArray.groupby(group=None, *, squeeze=False, restore_coord_dims=False, eagerly_compute_group=None, **groupers)[source]#

Returns a DataArrayGroupBy object for performing grouped operations.

Parameters:
  • group (str or DataArray or IndexVariable or sequence of hashable or mapping of hashable to Grouper) – Array whose unique values should be used to group this array. If a Hashable, must be the name of a coordinate contained in this dataarray. If a dictionary, must map an existing variable name to a Grouper instance.

  • squeeze (False) – This argument is deprecated.

  • restore_coord_dims (bool, default: False) – If True, also restore the dimension order of multi-dimensional coordinates.

  • eagerly_compute_group (bool, optional) – This argument is deprecated.

  • **groupers (Mapping of str to Grouper or Resampler) – Mapping of variable name to group by to Grouper or Resampler object. One of group or groupers must be provided. Only a single grouper is allowed at present.

Returns:

grouped (DataArrayGroupBy) – A DataArrayGroupBy object patterned after pandas.GroupBy that can be iterated over in the form of (unique_value, grouped_array) pairs.

Examples

Calculate daily anomalies for daily data:

>>> da = xr.DataArray(
...     np.linspace(0, 1826, num=1827),
...     coords=[pd.date_range("2000-01-01", "2004-12-31", freq="D")],
...     dims="time",
... )
>>> da
<xarray.DataArray (time: 1827)> Size: 15kB
array([0.000e+00, 1.000e+00, 2.000e+00, ..., 1.824e+03, 1.825e+03,
       1.826e+03], shape=(1827,))
Coordinates:
  * time     (time) datetime64[ns] 15kB 2000-01-01 2000-01-02 ... 2004-12-31
>>> da.groupby("time.dayofyear") - da.groupby("time.dayofyear").mean("time")
<xarray.DataArray (time: 1827)> Size: 15kB
array([-730.8, -730.8, -730.8, ...,  730.2,  730.2,  730.5], shape=(1827,))
Coordinates:
  * time       (time) datetime64[ns] 15kB 2000-01-01 2000-01-02 ... 2004-12-31
    dayofyear  (time) int64 15kB 1 2 3 4 5 6 7 8 ... 360 361 362 363 364 365 366

Use a Grouper object to be more explicit

>>> da.coords["dayofyear"] = da.time.dt.dayofyear
>>> da.groupby(dayofyear=xr.groupers.UniqueGrouper()).mean()
<xarray.DataArray (dayofyear: 366)> Size: 3kB
array([ 730.8,  731.8,  732.8, ..., 1093.8, 1094.8, 1095.5])
Coordinates:
  * dayofyear  (dayofyear) int64 3kB 1 2 3 4 5 6 7 ... 361 362 363 364 365 366
>>> da = xr.DataArray(
...     data=np.arange(12).reshape((4, 3)),
...     dims=("x", "y"),
...     coords={"x": [10, 20, 30, 40], "letters": ("x", list("abba"))},
... )

Grouping by a single variable is easy

>>> da.groupby("letters")
<DataArrayGroupBy, grouped over 1 grouper(s), 2 groups in total:
    'letters': UniqueGrouper('letters'), 2/2 groups with labels 'a', 'b'>

Execute a reduction

>>> da.groupby("letters").sum()
<xarray.DataArray (letters: 2, y: 3)> Size: 48B
array([[ 9, 11, 13],
       [ 9, 11, 13]])
Coordinates:
  * letters  (letters) object 16B 'a' 'b'
Dimensions without coordinates: y

Grouping by multiple variables

>>> da.groupby(["letters", "x"])
<DataArrayGroupBy, grouped over 2 grouper(s), 8 groups in total:
    'letters': UniqueGrouper('letters'), 2/2 groups with labels 'a', 'b'
    'x': UniqueGrouper('x'), 4/4 groups with labels 10, 20, 30, 40>

Use Grouper objects to express more complicated GroupBy operations

>>> from xarray.groupers import BinGrouper, UniqueGrouper
>>>
>>> da.groupby(x=BinGrouper(bins=[5, 15, 25]), letters=UniqueGrouper()).sum()
<xarray.DataArray (x_bins: 2, letters: 2, y: 3)> Size: 96B
array([[[ 0.,  1.,  2.],
        [nan, nan, nan]],

       [[nan, nan, nan],
        [ 3.,  4.,  5.]]])
Coordinates:
  * x_bins   (x_bins) interval[int64, right] 32B (5, 15] (15, 25]
  * letters  (letters) object 16B 'a' 'b'
Dimensions without coordinates: y

See also

GroupBy: Group and Bin Data

Users guide explanation of how to group and bin data.

Computational Patterns

Tutorial on Groupby() for windowed computation

Grouped Computations

Tutorial on Groupby() demonstrating reductions, transformation and comparison with resample()

pandas.DataFrame.groupby DataArray.groupby_bins Dataset.groupby core.groupby.DataArrayGroupBy DataArray.coarsen Dataset.resample DataArray.resample