What is the difference between Group By and Pivot Table

# What is the difference between Group By and Pivot Table >[!cite] What is the difference between `groupby` and `pivot_table`? > >**`pivot_table` = `groupby` + `unstack`** >**`groupby` = `pivot_table` + `stack`** > >In particular, if `columns` parameter of `pivot_table()` is not used, then `groupby()` and `pivot_table()` both produce the same result (if the same aggregator function is used). ```python # sample df = pd.DataFrame({"a": [1,1,1,2,2,2], "b": [1,1,2,2,3,3], "c": [0,0.5,1,1,2,2]}) # example gb = df.groupby(['a','b'])[['c']].sum() pt = df.pivot_table(index=['a','b'], values=['c'], aggfunc='sum') # equality test gb.equals(pt) #True ``` In general, if we check the [source code](https://github.com/pandas-dev/pandas/blob/main/pandas/core/reshape/pivot.py), `pivot_table()` internally calls `__internal_pivot_table()`. This function creates a single flat list out of index and columns and calls `groupby()` with this list as the grouper. Then after aggregation, calls `unstack()` on the list of columns. ```python gb = ( df .groupby(['a','b'])[['c']].sum() .unstack(['b']) ) pt = df.pivot_table(index=['a'], columns=['b'], values=['c'], aggfunc='sum') gb.equals(pt) # True ``` As `stack()` is the inverse operation of `unstack()`, the following holds True as well: **pivot_table = groupby + unstack** and **groupby = pivot_table + stack** hold True. In particular, if `columns` parameter of `pivot_table()` is not used, then `groupby()` and `pivot_table()` both produce the same result (if the same aggregator function is used). ```python # sample df = pd.DataFrame({"a": [1,1,1,2,2,2], "b": [1,1,2,2,3,3], "c": [0,0.5,1,1,2,2]}) # example gb = df.groupby(['a','b'])[['c']].sum() pt = df.pivot_table(index=['a','b'], values=['c'], aggfunc='sum') # equality test gb.equals(pt) #True ``` --- In general, if we check the [source code](https://github.com/pandas-dev/pandas/blob/main/pandas/core/reshape/pivot.py), `pivot_table()` internally calls `__internal_pivot_table()`. This function creates a single flat list out of index and columns and calls `groupby()` with this list as the grouper. Then after aggregation, calls `unstack()` on the list of columns. If columns are never passed, there is nothing to unstack on, so `groupby` and `pivot_table` trivially produce the same output. A demonstration of this function is: ```python gb = ( df .groupby(['a','b'])[['c']].sum() .unstack(['b']) ) pt = df.pivot_table(index=['a'], columns=['b'], values=['c'], aggfunc='sum') gb.equals(pt) # True ``` As `stack()` is the inverse operation of `unstack()`, the following holds True as well: ```python ( df .pivot_table(index=['a'], columns=['b'], values=['c'], aggfunc='sum') .stack(['b']) .equals( df.groupby(['a','b'])[['c']].sum() ) ) # True ``` However, there's a performance difference between the two methods. In short, `pivot_table()` is slower than `groupby().agg().unstack()`. You can [read more about it from this answer](https://stackoverflow.com/a/74048672/19123103). **TL;DR:** `pivot_table` loops over `aggfunc` no matter what's passed to it while `groupby` checks if cython-optimized implementation is available first and loops if not. ## Source - [cottontail](https://stackoverflow.com/a/72933069/20647829) (stackoverflow), [cottontail](https://stackoverflow.com/questions/44229489/pandas-performance-pivot-table-vs-groupby/74048672#74048672)