On 2017-04-12 02:29, Steve D'Aprano wrote: > >> In 2017, unless you are reading from old legacy files created > >> using a non-Unicode encoding, you should just use UTF-8. > > > > Thanks for your opinion. My opinion differs. > > What would you suggest then, if not UTF-8? > > My personal favourite legacy encoding is MacRoman, but I wouldn't > recommend anyone use it except to interoperate with legacy Mac > applications and/or data from the 80s and 90s. > > What's your recommendation? "Anything but ASCII"?
Heh, how about "Unicode as ASCII-compatible-Python-strings"? ;-) Got this from Peter Otten a while back in response to my request for functionality something like this. http://www.mail-archive.com/python-list@python.org/msg420100.html -tkc $ cat codecs_mynamereplace.py # -*- coding: utf-8 -*- import codecs import unicodedata try: codecs.namereplace_errors except AttributeError: print("using mynamereplace") def mynamereplace(exc): return u"".join( "\\N{%s}" % unicodedata.name(c) for c in exc.object[exc.start:exc.end] ), exc.end codecs.register_error("namereplace", mynamereplace) print(u"maƱana".encode("ascii", "namereplace").decode()) $ python3.5 codecs_mynamereplace.py ma\N{LATIN SMALL LETTER N WITH TILDE}ana $ python3.4 codecs_mynamereplace.py using mynamereplace ma\N{LATIN SMALL LETTER N WITH TILDE}ana $ python2.7 codecs_mynamereplace.py using mynamereplace ma\N{LATIN SMALL LETTER N WITH TILDE}ana -- https://mail.python.org/mailman/listinfo/python-list