18.6 C
London
Monday, May 20, 2024
HomePandas in PythonInput/Output in PythonPandas: Import CSV with Different Number of Columns per Row

Pandas: Import CSV with Different Number of Columns per Row

Related stories

Learn About Opening an Automobile Repair Shop in India

Starting a car repair shop is quite a good...

Unlocking the Power: Embracing the Benefits of Tax-Free Investing

  Unlocking the Power: Embracing the Benefits of Tax-Free Investing For...

Income Splitting in Canada for 2023

  Income Splitting in Canada for 2023 The federal government’s expanded...

Can I Deduct Home Office Expenses on my Tax Return 2023?

Can I Deduct Home Office Expenses on my Tax...

Canadian Tax – Personal Tax Deadline 2022

  Canadian Tax – Personal Tax Deadline 2022 Resources and Tools...

You can use the following basic syntax to import a CSV file into pandas when there are a different number of columns per row:

df = pd.read_csv('uneven_data.csv', header=None, names=range(4))

The value inside the range() function should be the number of columns in the row with the max number of columns.

The following example shows how to use this syntax in practice.

Example: Import CSV into Pandas with Different Number of Columns per Row

Suppose we have the following CSV file called uneven_data.csv:

Notice that each row does not have the same number of columns.

If we attempt to use the read_csv() function to import this CSV file into a pandas DataFrame, we’ll receive an error:

import pandas as pd

#attempt to import CSV file with differing number of columns per row
df = pd.read_csv('uneven_data.csv', header=None)

ParserError: Error tokenizing data. C error: Expected 2 fields in line 2, saw 4

We receive a ParserError that tells us pandas expected 2 fields (since this was the number of columns in the first row) but it saw 4.

This error tells us that the max number of columns in any given row is 4.

Thus, we can import the CSV file and supply a value of range(4) to the names argument:

import pandas as pd

#import CSV file with differing number of columns per row
df = pd.read_csv('uneven_data.csv', header=None, names=range(4)))

#view DataFrame
print(df)

   0   1     2     3
0  A  22   NaN   NaN
1  B  16  10.0  12.0
2  C  25  10.0   NaN
3  D  14   2.0   7.0
4  E  20   4.0   NaN

Notice that we’re able to successfully import the CSV file into a pandas DataFrame without any errors since we explicitly told pandas to expect 4 columns.

By default, pandas fills in any missing values in each row with NaN.

If you’d like the missing values to instead appear as zero, you can use the fillna() function as follows:

#fill NaN values with zeros
df_new = df.fillna(0)

#view new DataFrame
print(df_new)

   0   1     2     3
0  A  22   0.0   0.0
1  B  16  10.0  12.0
2  C  25  10.0   0.0
3  D  14   2.0   7.0
4  E  20   4.0   0.0

Each NaN value in the DataFrame has now been replaced with a zero.

Note: You can find the complete documentation for the pandas read_csv() function here.

Additional Resources

The following tutorials explain how to perform other common tasks in Python:

Pandas: How to Skip Rows when Reading CSV File
Pandas: How to Append Data to Existing CSV File
Pandas: How to Specify dtypes when Importing CSV File
Pandas: Set Column Names when Importing CSV File

Subscribe

- Never miss a story with notifications

- Gain full access to our premium content

- Browse free from up to 5 devices at once

Latest stories