If you want to dive deeper into converting datatypes in Pandas columns we’ve covered that extensively elsewhere, but for string to int conversions this is the post for you.
In this post, we’ll just focus on how to convert string values to int data types. It isn’t particularly hard, but it requires that the data is formatted correctly.
Before we start, we need to ensure we have a dataset to play around with. The below contains the code snippet you need to import a sample dataset and convert one of the columns into a string datatype for us to manipulate.
import pandas as pd
file_name = "https://people.sc.fsu.edu/~jburkardt/data/csv/homes.csv"
df = pd.read_csv(file_name)
df['Sell'] = df['Sell'].astype(float)
You can check the datatypes in your dataframe by using the dtypes function.
We see here that our Sell column was now an object datatype, indicating that it is a string. In the next two sections we’ll show you how to convert it back to an integer.
Convert to int with astype()
The first option we can use to convert the string back into int format is the astype() function.
df['Sell'] = df['Sell'].astype(int)
Convert to int with to_numeric()
The to_numeric() function can work wonders and is specifically designed for converting columns into numeric formats (either float or int formats).
df['Sell'] = pd.to_numeric(df['Sell'])
We can see from checking the datatypes using dtypes that the column is now back as an int datatype.
Summary
These approaches to string conversion are quick and easy to apply, if and only if your string data is already cleaned up into what should be integer formats. If for instance you have something like a “?” or some other character in your column data, you may need to parse through the text to replace it with NaN values or eliminate that row of data completely.
For the code we used in this post, check out our GitHub repository.