11.7 C
London
Sunday, May 19, 2024
HomePandas in PythonGeneral Functions in PythonPandas: Create Frequency Table Based on Multiple Columns

Pandas: Create Frequency Table Based on Multiple Columns

Related stories

Learn About Opening an Automobile Repair Shop in India

Starting a car repair shop is quite a good...

Unlocking the Power: Embracing the Benefits of Tax-Free Investing

  Unlocking the Power: Embracing the Benefits of Tax-Free Investing For...

Income Splitting in Canada for 2023

  Income Splitting in Canada for 2023 The federal government’s expanded...

Can I Deduct Home Office Expenses on my Tax Return 2023?

Can I Deduct Home Office Expenses on my Tax...

Canadian Tax – Personal Tax Deadline 2022

  Canadian Tax – Personal Tax Deadline 2022 Resources and Tools...

You can use the following basic syntax to create a frequency table in pandas based on multiple columns:

df.value_counts(['column1', 'column2'])

The following example shows how to use this syntax in practice.

Example: Create Frequency Table in Pandas Based on Multiple Columns

Suppose we have the following pandas DataFrame that contains information on team name, position, and points scored by various basketball players:

import pandas as pd

#create DataFrame
df = pd.DataFrame({'team' : ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'],
                   'position' : ['G', 'G', 'G', 'F', 'G', 'G', 'F', 'F'],
                   'points': [24, 33, 20, 15, 16, 16, 29, 25]})

#view DataFrame
print(df)

  team position  points
0    A        G      24
1    A        G      33
2    A        G      20
3    A        F      15
4    B        G      16
5    B        G      16
6    B        F      29
7    B        F      25

We can use the value_counts() function to create a frequency table that shows the occurrence of each combination of values in the team and position columns:

#count frequency of values in team and position columns
df.value_counts(['team', 'position'])

team  position
A     G           3
B     F           2
      G           2
A     F           1
dtype: int64

From the results we can see:

  • There are 3 occurrences of team A and position G
  • There are 2 occurrences of team B and position F
  • There are 2 occurrences of team B and position G
  • There is 1 occurrence of team A and position F

Note that we can use reset_index() to return a DataFrame as a result instead:

#count frequency of values in team and position columns and return DataFrame
df.value_counts(['team', 'position']).reset_index()

        team	position  0
0	A	G	  3
1	B	F	  2
2	B	G	  2
3	A	F	  1

We can use the rename() function to rename the column that contains the counts:

#get frequency of values in team and position column and rename count column
df.value_counts(['team', 'position']).reset_index().rename(columns={0:'count'})

        team	position  count
0	A	G	  3
1	B	F	  2
2	B	G	  2
3	A	F	  1

The end result is a DataFrame that contains the frequency of each unique combination of values in the team and position columns.

Additional Resources

The following tutorials explain how to perform other common tasks in pandas:

Pandas: How to Use GroupBy and Value Counts
Pandas: How to Use GroupBy with Bin Counts
Pandas: How to Count Values in Column with Condition

Subscribe

- Never miss a story with notifications

- Gain full access to our premium content

- Browse free from up to 5 devices at once

Latest stories