Solving Python Error – ‘utf-8’ codec can’t decode byte 0xa0 in position : invalid start byte

‘utf-8’ codec can’t decode byte 0xa0 in position : invalid start byte is an error which occurs in Python when your data is not properly encoded or not encoded at all.

In this article we are going to explain why the error is popping up and show you how to solve the error and get rid of it for good.

Explaining the Error : ‘utf-8’ codec can’t decode byte 0xa0 in position : invalid start byte

The error happens in Python when your data is not properly encoded or not encoded at all usually in .csv files or .txt files.

Bellow is an example of the error message. Please make sure the error message you have looks similar to the one bellow.

                                                                       #
'utf-8' codec can't decode byte 0xa0 in position 4276: invalid start byte
                                                                       #

Bellow is a number of tested solutions that I have tried and worked for me.

Solution 1 : use encoding in pd.read_csv

The easiest solution to the issue is to use the line bellow.

                                                                       #
df = pd.read_csv("text.txt",encoding='windows-1254')
                                                                       #

If the above does not work, try this one.

                                                                       #
ds = pd.read_csv('/Dataset/test.csv', encoding='windows-1252') 
                                                                       #

If this solution does not work try the next fix.

Solution 2 : use encoding=’windows-1252′

The second solution to the problem is to encode the file like in the example bellow.

                                                                       #
open('txt.tsv', encoding='windows-1252')
                                                                       #

If this solution does not work, the thirst solution might do the trick.

Solution 3 : use encoding=’ANSI’ in pd.read_csv

The third solution is try encoding with ‘ANSI’. Like in the example bellow.

                                                                       #
df = pd.read_csv('Text.csv',encoding='ANSI')
                                                                       #

If the above does not work try encoding with ‘latin1’.

                                                                       #
df=pd.read_csv("../CSV_FILE.csv",na_values=missing, encoding='latin1')
                                                                       #

I hope this guide solved your problem, thank you for reading.

you can support us by donating to our Kofi account, this website is free to use but any help is very appreciated and goes a long way.

Summing-up

Finally, we are at the end of this article, I hope this article has been helpful, I hope you solved your problem, coding can be hard when you have a lot of confusing errors here and there.

Thank you for reading, keep learning and keep coding, cheers. If you want to learn more about Python, please check out the Python Documentation : https://docs.python.org/3/