On Saturday, October 20, 2018 at 1:23:50 PM UTC-4, Terry Reedy wrote: > On 10/20/2018 8:24 AM, pjmcle...@gmail.com wrote: > > On Saturday, October 13, 2018 at 7:24:14 PM UTC-4, MRAB wrote: > > > i have a sort of decode error > > UnicodeDecodeError; 'utf-8' can't decode byte 0xb0 in position 83064: > > invalid start byte > > ***************** > > and it seems to refer to my code line: > > *********** > > data = f.read() > > *************** > > which is part of this block of code > > ******************** > > # Read content of files > > for path in files: > > with open(join("docs", path), encoding="utf-8") as f: > > #with open(join("docs", path)) as f: > > data = f.read() > > process_data(data) > > *********************************************** > > > > would the solution fix be this? > > ********************** > > data = f.read(), decoding = "utf-8" #OR > > data = f.read(), decoding = "ascii" # is this the right fix or previous or > > both wrong?? > > Both statements are invalid syntax. The encoding is set in the open > statement. > > What you need to find out: is '0xb0' a one-byte error or is 'utf-8' the > wrong encoding? Things I might do: > > 1. Change the encoding in open() to 'ascii' and see if the exception > message still refers to position 83064 or if there is a non-ascii > character earlier in the file. The latter would mean that there is at > least one earlier non-ascii sequence that was decoded as uft-8. This > would suggest that 'utf-8' might be correct and that the '0xb0' byte is > an error. > > 2. In the latter case, add "errors='handler'", where 'handler' is > something other than the default 'strict'. Look in the doc or see > help(open) for alternatives. > > 3. In open(), replace "encoding='utf-8'" with "mode='rb'" so that > f.read() creates data as bytes instead of a text string. Then print, > say, data[83000:83200] to see the context of the non-ascii byte. > > 4. Change to encoding in open() to 'latin-1'. The file will then be > read as text without error, even if latin-1 is the wrong encoding. > > > > -- > Terry Jan Reedy
hello terry just want to add that i tried also setting in notepad ++ encoding to utf-8 from ansi and then i encoded utf-8 in my file but i got same error then i tried encoding ascii in my file and it worked so encdoong ascii and latin-1 work not sure why utf-8 gives an error when thats the most wide all caracters inclusive right?/ thxz jessica -- https://mail.python.org/mailman/listinfo/python-list