On 18/08/2022 03.33, Stefan Ram wrote: > Tobiah <t...@tobiah.org> writes: >> I get data from various sources; client emails, spreadsheets, and >> data from web applications. I find that I can do >> some_string.decode('latin1') > > Strings have no "decode" method. ("bytes" objects do.) > >> to get unicode that I can use with xlsxwriter, >> or put <meta charset="latin1"> in the header of a web page to display >> European characters correctly. > > |You should always use the UTF-8 character encoding. (Remember > |that this means you also need to save your content as UTF-8.) > World Wide Web Consortium (W3C) (2014) > >> am using data from the wild. It's frustrating that I have to play >> a guessing game to figure out how to use incoming text. I'm just wondering > > You can let Python guess the encoding of a file. > > def encoding_of( name ): > path = pathlib.Path( name ) > for encoding in( "utf_8", "cp1252", "latin_1" ): > try: > with path.open( encoding=encoding, errors="strict" )as file: > text = file.read() > return encoding > except UnicodeDecodeError: > pass > return None > >> if there are any thoughts. What if we just globally decided to use utf-8? >> Could that ever happen? > > That decisions has been made long ago.
Unfortunately, much of our data was collected long before then - and as we've discovered, the OP is still living in Python 2 times. What about if the path "name" (above) is not in utf-8? eg the OP's Montréal in Latin1, as Montréal.txt or Montréal.rpt -- Regards, =dn -- https://mail.python.org/mailman/listinfo/python-list