DataFrameGroupBy.
aggregate
Aggregate using one or more operations over the specified axis.
Function to use for aggregating the data. If a function, must either work when passed a DataFrame or when passed to DataFrame.apply.
Accepted combinations are:
function
string function name
list of functions and/or function names, e.g. [np.sum, 'mean']
[np.sum, 'mean']
dict of axis labels -> functions, function names or list of such.
Can also accept a Numba JIT function with engine='numba' specified.
engine='numba'
If the 'numba' engine is chosen, the function must be a user defined function with values and index as the first and second arguments respectively in the function signature. Each group’s index will be passed to the user defined function and optionally available for use.
'numba'
values
index
Changed in version 1.1.0.
Positional arguments to pass to func
'cython' : Runs the function through C-extensions from cython.
'cython'
'numba' : Runs the function through JIT compiled code from numba.
None : Defaults to 'cython' or globally setting compute.use_numba
None
compute.use_numba
New in version 1.1.0.
For 'cython' engine, there are no accepted engine_kwargs
engine_kwargs
For 'numba' engine, the engine can accept nopython, nogil and parallel dictionary keys. The values must either be True or False. The default engine_kwargs for the 'numba' engine is {'nopython': True, 'nogil': False, 'parallel': False} and will be applied to the function
nopython
nogil
parallel
True
False
{'nopython': True, 'nogil': False, 'parallel': False}
Keyword arguments to be passed into func.
See also
DataFrame.groupby.apply
DataFrame.groupby.transform
DataFrame.aggregate
Notes
When using engine='numba', there will be no “fall back” behavior internally. The group data and group index will be passed as numpy arrays to the JITed user defined function, and no alternative execution attempts will be tried.
Examples
>>> df = pd.DataFrame( ... { ... "A": [1, 1, 2, 2], ... "B": [1, 2, 3, 4], ... "C": [0.362838, 0.227877, 1.267767, -0.562860], ... } ... )
>>> df A B C 0 1 1 0.362838 1 1 2 0.227877 2 2 3 1.267767 3 2 4 -0.562860
The aggregation is for each column.
>>> df.groupby('A').agg('min') B C A 1 1 0.227877 2 3 -0.562860
Multiple aggregations
>>> df.groupby('A').agg(['min', 'max']) B C min max min max A 1 1 2 0.227877 0.362838 2 3 4 -0.562860 1.267767
Select a column for aggregation
>>> df.groupby('A').B.agg(['min', 'max']) min max A 1 1 2 2 3 4
Different aggregations per column
>>> df.groupby('A').agg({'B': ['min', 'max'], 'C': 'sum'}) B C min max sum A 1 1 2 0.590715 2 3 4 0.704907
To control the output names with different aggregations per column, pandas supports “named aggregation”
>>> df.groupby("A").agg( ... b_min=pd.NamedAgg(column="B", aggfunc="min"), ... c_sum=pd.NamedAgg(column="C", aggfunc="sum")) b_min c_sum A 1 1 0.590715 2 3 0.704907
The keywords are the output column names
The values are tuples whose first element is the column to select and the second element is the aggregation to apply to that column. Pandas provides the pandas.NamedAgg namedtuple with the fields ['column', 'aggfunc'] to make it clearer what the arguments are. As usual, the aggregation can be a callable or a string alias.
pandas.NamedAgg
['column', 'aggfunc']
See Named aggregation for more.