UnicodeEncodeError: ‘charmap’ codec can’t encode – character maps to , print function in Python

The error : ” UnicodeEncodeError: ‘charmap’ codec can’t encode – character maps to <undefined>, print function ” occurs a lot in Python, first time I encountered this error I realised It is not going to be easy to solve, I was right. But I solved it after a lot of trial and error.

In this article I’m going to explain the problem present the solution that worked for me and present other possible solutions that I stumbled upon in my error solving journey.

Explaining : UnicodeEncodeError: ‘charmap’ codec can’t encode – character maps to , print function in Python

First of all let us define the problem, In my case I was writing a program in Python 3.3, the goal was to send data to a webpage using the POST method.

This is the exact error I got after running the code

                                                                       #
UnicodeEncodeError: 'charmap' codec can't encode character '\u2014' in position 10248: character maps to &lt;undefined&gt;
                                                                       #

My page is a formatted UTF-8 document, the page is encoded as a bytes element returned by the HTTPResponse .read() method.

Everything was fine on the IDLE GUI for Windows until I switched to the Windows console.

The print function translates the U+2014 character (em-dash) included in the returned page very well in the Windows GUI, but not in the windows console.

I tried everything with no avail, until I solved the error, bellow I present to you how i did it and what are all the possible solutions to the error.

Solution 1 : replace the faulty character “—” with the “?” character

After a lot of trial and error I came up with this solution:

Let us replace the faulty character “—” with the “?” character . The solution’s code is the one bellow:

                                                                       #
print(data.decode('utf-8').encode('cp850','replace').decode('cp850'))
                                                                       #

This solution worked, but I do not like the code above because it is messy even when it solves the problem. I need the solution to be robust and work for all cases, the code above does not.

So this solution works but I need a solution where my code does not depend on the output interface encoding and works for most use cases. Which brings us to the other possible solutions bellow.

Solution 2 : Reset the output encoding globally

We should change the I/O encoding function at the beginning of the software like in the code bellow, bellow you can find the code for both Python 2 and Python 3 :

                                                                       #
if sys.stdout.encoding != 'cp850':
  sys.stdout = codecs.getwriter('cp850')(sys.stdout.buffer, 'strict')
if sys.stderr.encoding != 'cp850':
  sys.stderr = codecs.getwriter('cp850')(sys.stderr.buffer, 'strict')
                                                                       #

You can change the approaches and you can also set different encodings.

Any data, texts or input must be correctly convertible into unicode since outputting non specified data directly will not work :

                                                                       #
import sys
import codecs
sys.stdout = codecs.getwriter("iso-8859-1")(sys.stdout, 'xmlcharrefreplace')
print u"Stöcker"                # working
print "Stöcker".decode("utf-8") # working
                                                                       #

Solution 3 : encode(‘utf-8’)

This is another way to solve UnicodeEncodeError: ‘charmap’ codec can’t encode – character maps to <undefined>, print function in Python.

For a lot of people including me this is the easiest solution, depending on your code, this may or may not work. But It is worth a try, right ?

In this case we will replace code 1 with code 2 :

                                                                       #
print("Process lines, file_name command_line %s\n"% command_line))
                                                                       #
                                                                       #
print("Process lines, file_name command_line %s\n"% command_line.encode('utf-8'))  
                                                                       #

If this solution worked for you, buy me a Goddamn coffee, you can use the red Kofi donation button above but you do not have to, I love helping fellow developers.

Bellow are some other honorary solutions that worked for some people.

Solution 4 : The Unicode Console Package

You can use print(repr(data)), always print Unicode if you want to display text in Python, never hardcode the character encoding of your environment, in order to print Unicode to Windows console, you can use the ” win-unicode-console package “.

Solution 5 : chcp 65001

This solution works only if you are using the windows command line in order to print your data. If you are not, then this is not going to work.

                                                                       #
chcp 65001
                                                                       #

Summing-up

We arrived at the end of this quest to solve this annoying error, I hope me sharing my experience with you helped, I hope the other solutions helped, If you like this website support us on Kofi and keep browsing, thank you and good luck with your Python Journey, Cheers.

If you want to learn more about Python, please check out the Python Documentation : https://docs.python.org/3/