14.8 C
London
Friday, May 10, 2024
HomePandas in PythonGeneral Functions in PythonHow to Extract Number from String in Pandas

How to Extract Number from String in Pandas

Related stories

Learn About Opening an Automobile Repair Shop in India

Starting a car repair shop is quite a good...

Unlocking the Power: Embracing the Benefits of Tax-Free Investing

  Unlocking the Power: Embracing the Benefits of Tax-Free Investing For...

Income Splitting in Canada for 2023

  Income Splitting in Canada for 2023 The federal government’s expanded...

Can I Deduct Home Office Expenses on my Tax Return 2023?

Can I Deduct Home Office Expenses on my Tax...

Canadian Tax – Personal Tax Deadline 2022

  Canadian Tax – Personal Tax Deadline 2022 Resources and Tools...

You can use the following basic syntax to extract numbers from a string in pandas:

df['my_column'].str.extract('(d+)')

This particular syntax will extract the numbers from each string in a column called my_column in a pandas DataFrame.

Note: When using a regular expression, d represents “any digit” and + stands for “one or more.”

The following example shows how to use this function in practice.

Example: Extract Number from String in Pandas

Suppose we have the following pandas DataFrame that contains information about the sales of various products:

import pandas as pd

#create DataFrame
df = pd.DataFrame({'product': ['A33', 'B34', 'A22', 'A50', 'C200', 'D7', 'A9', 'A13'],
                   'sales': [18, 22, 19, 14, 14, 11, 20, 28]})

#view DataFrame
print(df)

  product  sales
0     A33     18
1     B34     22
2     A22     19
3     A50     14
4    C200     14
5      D7     11
6      A9     20
7     A13     28

Suppose we would like to extract the number from each string in the product column.

We can use the following syntax to do so:

#extract numbers from strings in 'product' column
df['product'].str.extract('(d+)')

	0
0	33
1	34
2	22
3	50
4	200
5	7
6	9
7	13

The result is a DataFrame that contains only the numbers from each row in the product column.

For example:

  • The formula extracts 33 from the string A33 in the first row.
  • The formula extracts 34 from the string B34 in the first row.
  • The formula extracts 22 from the string A22 in the first row.

And so on.

If you’d like, you can also store these numerical values in a new column in the DataFrame:

#extract numbers from strings in 'product' column and store them in new column
df['product_numbers'] = df['product'].str.extract('(d+)')

#view updated DataFrame
print(df)

  product  sales product_numbers
0     A33     18              33
1     B34     22              34
2     A22     19              22
3     A50     14              50
4    C200     14             200
5      D7     11               7
6      A9     20               9
7     A13     28              13

The new column called product_numbers contains only the numbers from each string in the product column.

Additional Resources

The following tutorials explain how to perform other common operations in pandas:

Pandas: How to Sort DataFrame Based on String Column
Pandas: How to Remove Specific Characters from Strings
Pandas: Search for String in All Columns of DataFrame

Subscribe

- Never miss a story with notifications

- Gain full access to our premium content

- Browse free from up to 5 devices at once

Latest stories