Solving Pandas Error – ValueError: cannot convert float NaN to integer

ValueError: cannot convert float NaN to integer is an error which occurs when you do not know how to convert a column into Integer.

In this article I am going to help you solve this error and understand the root of the problem, also I am presenting other possible solutions that may work if the main solution does not work for you.

Explaining the Error : ValueError: cannot convert float NaN to integer

First of all, let us try to reproduce the error. Let us run the script bellow.

                                                                       #
df = pandas.read_csv('dragons.csv')
df[['x']] = df[['x']].astype(int)
                                                                       #

This is the error message we get. Make sure you have the same error message before you start testing the solutions I have prepared for you.

                                                                       #
ValueError: cannot convert float NaN to integer
                                                                       #

Bellow is a number of tested solutions that I have tried and that have worked for me.

Solution 1 : correctly convert to Int32.

First, we should understand that if you want to convert a column to integer you need to convert to numpy float first then to nullable Int32, just like in the example bellow.

                                                                       #
df['column_name'].astype(np.float).astype("Int32")
                                                                       #

Some precision will be lost and you can not do anything about it. If your numbers are big that is.

If you like this solution, make sure you try it. However, if it does not work please try the solution bellow.

Solution 2 : a list of steps to do the conversion properly.

The second solution is a bit long, you should first start by identifying NaN values using boolean indexing.

Bellow is how you can do that using one line of code.

                                                                       #
print(df[df['x'].isnull()])
                                                                       #

Then add the following line.

                                                                       #
df['x'] = pd.to_numeric(df['x'], errors='coerce')
                                                                       #

The next step is to remove all rows with NaNs in the column x. Then convert the values into ints.

                                                                       #
df = df.dropna(subset=['x'])
df['x'] = df['x'].astype(int)
                                                                       #

If this fix does not work for you, please try the solution bellow.

Solution 3 : subset the dataframe by notnull() values.

If the solutions above did not work, another solution is using .loc to subset the dataframe by notnull() values, subset out only the column ‘x’ and apply(int) to the vector. You can by using the command bellow assuming x is a float column.

                                                                       #
df.loc[df['x'].notnull(), 'x'] = df.loc[df['x'].notnull(), 'x'].apply(int)
                                                                       #

I hope the solutions above fixed your problem, good luck with the scripts to come.

Summing-up

I can not find any other solution to the problem guys, I tried my best, I hope the above solutions worked for you, cheers, keep coding. If you want to learn more about Python, please check out the Python Documentation : https://docs.python.org/3/