10.4 C
London
Sunday, May 19, 2024
HomePandas in PythonInput/Output in PythonPandas: How to Specify dtypes when Importing CSV File

Pandas: How to Specify dtypes when Importing CSV File

Related stories

Learn About Opening an Automobile Repair Shop in India

Starting a car repair shop is quite a good...

Unlocking the Power: Embracing the Benefits of Tax-Free Investing

  Unlocking the Power: Embracing the Benefits of Tax-Free Investing For...

Income Splitting in Canada for 2023

  Income Splitting in Canada for 2023 The federal government’s expanded...

Can I Deduct Home Office Expenses on my Tax Return 2023?

Can I Deduct Home Office Expenses on my Tax...

Canadian Tax – Personal Tax Deadline 2022

  Canadian Tax – Personal Tax Deadline 2022 Resources and Tools...

You can use the following basic syntax to specify the dtype of each column in a DataFrame when importing a CSV file into pandas:

df = pd.read_csv('my_data.csv',
                 dtype = {'col1': str, 'col2': float, 'col3': int})

The dtype argument specifies the data type that each column should have when importing the CSV file into a pandas DataFrame.

The following example shows how to use this syntax in practice.

Example: Specify dtypes when Importing CSV File into Pandas

Suppose we have the following CSV file called basketball_data.csv:

If we import the CSV file using the read_csv() function, pandas will attempt to identify the data type for each column automatically:

import pandas as pd

#import CSV file
df = pd.read_csv('basketball_data.csv')

#view resulting DataFrame
print(df)

   A  22  10
0  B  14   9
1  C  29   6
2  D  30   2
3  E  22   9
4  F  31  10

#view data type of each column
print(df.dtypes)

team        object
points       int64
rebounds     int64
dtype: object

From the output we can see that the columns in the DataFrame have the following data types:

  • team: object
  • points: int64
  • rebounds: int64

However, we can use the dtype argument within the read_csv() function to specify the data types that each column should have:

import pandas as pd

#import CSV file and specify dtype of each column
df = pd.read_csv('basketball_data.csv',
                 dtype = {'team': str, 'points': float, 'rebounds': int}))

#view resulting DataFrame
print(df)

   A  22  10
0  B  14   9
1  C  29   6
2  D  30   2
3  E  22   9
4  F  31  10

#view data type of each column
print(df.dtypes)

team         object
points      float64
rebounds      int32
dtype: object

From the output we can see that the columns in the DataFrame have the following data types:

  • team: object
  • points: float64
  • rebounds: int32

These data types match the ones that we specified using the dtype argument.

Note that in this example, we specified the dtype for each column in the DataFrame.

However, you can choose to specify the dtype for only specific columns and let pandas infer the dtype for the remaining columns.

Note: You can find the complete documentation for the pandas read_csv() function here.

Additional Resources

The following tutorials explain how to perform other common tasks in pandas:

Pandas: How to Skip Rows when Reading CSV File
Pandas: How to Append Data to Existing CSV File
Pandas: How to Read CSV File Without Headers
Pandas: How to Set Column Names when Importing CSV File

Subscribe

- Never miss a story with notifications

- Gain full access to our premium content

- Browse free from up to 5 devices at once

Latest stories