11.7 C
London
Sunday, May 19, 2024
HomePandas in PythonGeneral Functions in PythonPandas: Search for String in All Columns of DataFrame

Pandas: Search for String in All Columns of DataFrame

Related stories

Learn About Opening an Automobile Repair Shop in India

Starting a car repair shop is quite a good...

Unlocking the Power: Embracing the Benefits of Tax-Free Investing

  Unlocking the Power: Embracing the Benefits of Tax-Free Investing For...

Income Splitting in Canada for 2023

  Income Splitting in Canada for 2023 The federal government’s expanded...

Can I Deduct Home Office Expenses on my Tax Return 2023?

Can I Deduct Home Office Expenses on my Tax...

Canadian Tax – Personal Tax Deadline 2022

  Canadian Tax – Personal Tax Deadline 2022 Resources and Tools...

You can use the following syntax to search for a particular string in each column of a pandas DataFrame and filter for rows that contain the string in at least one column:

#define filter
mask = np.column_stack([df[col].str.contains(r"my_string", na=False) for col in df])

#filter for rows where any column contains 'my_string'
df.loc[mask.any(axis=1)]

The following example shows how to use this syntax in practice.

Example: Search for String in All Columns of Pandas DataFrame

Suppose we have the following pandas DataFrame that contains information about the first role and second role of various basketball players on a team:

import pandas as pd

#create DataFrame
df = pd.DataFrame({'player': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H'],
                   'first_role': ['P Guard', 'P Guard', 'S Guard', 'S Forward',
                                  'P Forward', 'Center', 'Center', 'Center'],
                   'second_role': ['S Guard', 'S Guard', 'Forward', 'S Guard',
                                   'S Guard', 'S Forward', 'P Forward', 'P Forward']})

#view DataFrame
print(df)

  player first_role second_role
0      A    P Guard     S Guard
1      B    P Guard     S Guard
2      C    S Guard     Forward
3      D  S Forward     S Guard
4      E  P Forward     S Guard
5      F     Center   S Forward
6      G     Center   P Forward
7      H     Center   P Forward

The following code shows how to filter the pandas DataFrame for rows where the string “Guard” occurs in any column:

import numpy as np

#define filter
mask = np.column_stack([df[col].str.contains(r"Guard", na=False) for col in df])

#filter for rows where any column contains 'Guard'
df.loc[mask.any(axis=1)]

        player	first_role  second_role
0	A	P Guard	    S Guard
1	B	P Guard	    S Guard
2	C	S Guard	    Forward
3	D	S Forward   S Guard
4	E	P Forward   S Guard

Notice that each row in the resulting DataFrame contains the string “Guard” in at least one column.

You could also filter for rows where one of several strings occurs in at least one column by using the “OR” ( | ) operator in pandas.

For example, the following code shows how to filter for rows where either “P Guard” or “Center” occurs in at least one column:

import numpy as np

#define filter
mask = np.column_stack([df[col].str.contains(r"P Guard|Center", na=False) for col in df])

#filter for rows where any column contains 'P Guard' or 'Center'
df.loc[mask.any(axis=1)]

        player	first_role  second_role
0	A	P Guard	    S Guard
1	B	P Guard	    S Guard
5	F	Center	    S Forward
6	G	Center	    P Forward
7	H	Center	    P Forward

Notice that each row in the resulting DataFrame contains “P Guard” or “Center” in at least one column.

Note: It’s important to include the argument na=False within the contains() function or else you will encounter an error if NaN values are present in the DataFrame.

Additional Resources

The following tutorials explain how to perform other common filtering operations in pandas:

How to Filter a Pandas DataFrame by Column Values
How to Filter Pandas DataFrame Rows by Date
How to Filter a Pandas DataFrame on Multiple Conditions

Subscribe

- Never miss a story with notifications

- Gain full access to our premium content

- Browse free from up to 5 devices at once

Latest stories