11.7 C
London
Sunday, May 19, 2024
HomePandas in PythonGeneral Functions in PythonPandas: How to Sort DataFrame Based on String Column

Pandas: How to Sort DataFrame Based on String Column

Related stories

Learn About Opening an Automobile Repair Shop in India

Starting a car repair shop is quite a good...

Unlocking the Power: Embracing the Benefits of Tax-Free Investing

  Unlocking the Power: Embracing the Benefits of Tax-Free Investing For...

Income Splitting in Canada for 2023

  Income Splitting in Canada for 2023 The federal government’s expanded...

Can I Deduct Home Office Expenses on my Tax Return 2023?

Can I Deduct Home Office Expenses on my Tax...

Canadian Tax – Personal Tax Deadline 2022

  Canadian Tax – Personal Tax Deadline 2022 Resources and Tools...

You can use the following methods to sort the rows of a pandas DataFrame based on the values in a particular string column:

Method 1: Sort by String Column (when column only contains characters)

df = df.sort_values('my_string_column')

Method 2: Sort by String Column (when column contains characters and digits)

#create 'sort' column that contains digits from 'my_string_column'
df['sort'] = df['my_string_column'].str.extract('(d+)', expand=False).astype(int)

#sort rows based on digits in 'sort' column
df = df.sort_values('sort')

The following examples show how to use each method in practice.

Example 1: Sort by String Column (when column only contains characters)

Suppose we have the following pandas DataFrame that contains information about the sales of various products at some grocery store:

import pandas as pd

#create DataFrame
df = pd.DataFrame({'product': ['Apples', 'Oranges', 'Bananas', 'Lettuce', 'Beans'],
                   'sales': [18, 22, 19, 14, 29]})

#view DataFrame
print(df)

   product  sales
0   Apples     18
1  Oranges     22
2  Bananas     19
3  Lettuce     14
4    Beans     29

We can use the following syntax to sort the rows of the DataFrame based on the strings in the product column:

#sort rows from A to Z based on string in 'product' column
df = df.sort_values('product')

#view updated DataFrame
print(df)

   product  sales
0   Apples     18
2  Bananas     19
4    Beans     29
3  Lettuce     14
1  Oranges     22

Notice that the rows are now sorted from A to Z based on the strings in the product column.

If you’d like to instead sort from Z to A, simply add the argument ascending=False:

#sort rows from Z to A based on string in 'product' column
df = df.sort_values('product', ascending=False)

#view updated DataFrame
print(df)

   product  sales
1  Oranges     22
3  Lettuce     14
4    Beans     29
2  Bananas     19
0   Apples     18

Notice that the rows are now sorted from Z to A based on the strings in the product column.

Example 2: Sort by String Column (when column contains characters and digits)

Suppose we have the following pandas DataFrame that contains information about the sales of various products at some grocery store:

import pandas as pd

#create DataFrame
df = pd.DataFrame({'product': ['A3', 'A5', 'A22', 'A50', 'A2', 'A7', 'A9', 'A13'],
                   'sales': [18, 22, 19, 14, 14, 11, 20, 28]})

#view DataFrame
print(df)

  product  sales
0      A3     18
1      A5     22
2     A22     19
3     A50     14
4      A2     14
5      A7     11
6      A9     20
7     A13     28

Notice that the strings in the product column contain both characters and digits.

If we attempt to sort the rows of the DataFrame using the values in the product column, the strings will not be sorted in the correct order based on the digits:

import pandas as pd

#sort rows based on strings in 'product' column
df = df.sort_values('product')

#view updated DataFrame
print(df)

  product  sales
7     A13     28
4      A2     14
2     A22     19
0      A3     18
1      A5     22
3     A50     14
5      A7     11
6      A9     20

Instead, we must create a new temporary column called sort that contains only the digits from the product column, then sort by the values in the sort column, then drop the column entirely:

import pandas as pd

#create new 'sort' column that contains digits from 'product' column
df['sort'] = df['product'].str.extract('(d+)', expand=False).astype(int)

#sort rows based on digits in 'sort' column
df = df.sort_values('sort')

#drop 'sort' column
df = df.drop('sort', axis=1)

#view updated DataFrame
print(df)

  product  sales
4      A2     14
0      A3     18
1      A5     22
5      A7     11
6      A9     20
7     A13     28
2     A22     19
3     A50     14

Notice that the rows are now sorted by the strings in the product column and the digits are sorted in the correct order.

Note: You can find the complete documentation for the pandas sort_values() function here.

Additional Resources

The following tutorials explain how to perform other common tasks in pandas:

Pandas: How to Sort by Date
Pandas: How to Sort Columns by Name
Pandas: How to Sort by Both Index and Column

Subscribe

- Never miss a story with notifications

- Gain full access to our premium content

- Browse free from up to 5 devices at once

Latest stories