Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to

Terry Reedy Sat, 20 Oct 2018 10:27:02 -0700

On 10/20/2018 8:24 AM, pjmcle...@gmail.com wrote:

On Saturday, October 13, 2018 at 7:24:14 PM UTC-4, MRAB wrote:

i have a sort of decode error
UnicodeDecodeError; 'utf-8' can't decode byte 0xb0 in position 83064: invalid 
start byte
*****************
and it seems to refer to my code line:
***********
data = f.read()
***************
which is part of this block of code
********************
# Read content of files
     for path in files:
         with open(join("docs", path), encoding="utf-8") as f:
         #with open(join("docs", path)) as f:
             data = f.read()
             process_data(data)
***********************************************

would the solution fix be this?
**********************
data = f.read(), decoding = "utf-8"  #OR
data = f.read(), decoding = "ascii" # is this the right fix or previous or both 
wrong??

Both statements are invalid syntax. The encoding is set in the openstatement.

What you need to find out: is '0xb0' a one-byte error or is 'utf-8' thewrong encoding? Things I might do:

1. Change the encoding in open() to 'ascii' and see if the exceptionmessage still refers to position 83064 or if there is a non-asciicharacter earlier in the file. The latter would mean that there is atleast one earlier non-ascii sequence that was decoded as uft-8. Thiswould suggest that 'utf-8' might be correct and that the '0xb0' byte isan error.

2. In the latter case, add "errors='handler'", where 'handler' issomething other than the default 'strict'. Look in the doc or seehelp(open) for alternatives.

3. In open(), replace "encoding='utf-8'" with "mode='rb'" so thatf.read() creates data as bytes instead of a text string. Then print,say, data[83000:83200] to see the context of the non-ascii byte.

4. Change to encoding in open() to 'latin-1'. The file will then beread as text without error, even if latin-1 is the wrong encoding.




--
Terry Jan Reedy

--
https://mail.python.org/mailman/listinfo/python-list

Re: UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 10442: character maps to

Reply via email to