Analysts frequently need to change the data type of a Pandas DataFrame column or Series due to many potential reasons. Some of these would include:
- The data type of the column is not imported correctly when a datasource is first opened
- Data types need to change in order to perform some specific operation or transformation of the data
- Transformed data is automatically stored in a DataFrame in the wrong data type during an operation
We often find that the datatypes available in Pandas (below) need to be changed or readjusted depending on the above scenarios. Here, we’ll cover the three most common and widely used approaches to changing data types in Pandas.
The quickest path for transforming the column to a defined data type is to use the .astype() function on the column and reassign that transformed value to the same variable name.
In the example below we use sample data from FSU to show how a DataFrame’s column names can be changed.
Once the data is loaded into the DataFrame we see that the Sell column is stored in an integer format or int format. By listing all the data types of each column in the DataFrame using .dtypes we can see the data types of each column.
Now that we know the datatypes that are actually in our DataFrame, we can then transform the column into a float or string data type using the .astype() function and declaring the type as float or str:
df['Sell'] = df['Sell'].astype(float)
We can see from the output of changing the Sell column that all the integer values have been changed to float values by the additional decimal added at the end of the column values above.
.astype() handles the majority of operations on DataFrame column. We can also perform separate operations on Pandas Series objects by using other specific functions such as pandas.to_numeric(), pandas.to_datetime(), and others. Below is a quick example of the to_numeric() function being applied to our ‘Sell’ series to turn it back into an integer:
df['Sell'] = pd.to_numeric(df['Sell'])
To show the functionality of to_numeric, a specialized function for managing datetime objects, we import an open data set on air temperatures to perform our data type change. Upon opening the file, we see that the data type of the Date column is an object. However, to perform more advanced analysis of the data, getting the data into a datetime object is ideal.
file_name = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/daily-min-temperatures.csv" df_ts = pd.read_csv(file_name) df_ts.dtypes df_ts = pd.to_datetime(df_ts["Date"]) df_ts.dtypes
Our to_datetime() operation successfully changed the data type from an object to a datetime64[ns] object, which makes our data now available for more formidable time series analysis.
We’ve covered the basics of how to change data types in Pandas. We understand now how to quickly identify the data types we have in a given column and adjust that to the data type that is compatible with a new format. astype() is the best functional approach to managing data type changes across all the data types available in Pandas. The specialty functions for converting numeric and datatime objects are useful for their specific functions.