Steven D'Aprano <steve+comp.lang.pyt...@pearwood.info>: > With a few exceptions, /etc is filled with text files, not binary > files, and half the executables on the system are text (Python, Perl, > bash, sh, awk, etc.).
Our debate seems to stem from a different idea of what text is. To me, text in the Python sense is a sequence of UCS-4 character code points. The opposite of text is not necessarily binary. Most of those "text" files under /etc expect ASCII. In many contexts, they tolerate UTF-8 or Latin-3 or whatever, but it's a bit iffy (how are extra-ASCII passwords encoded in the /etc/shadow?). Also, the files under /etc, /var/log etc should not depend on the locale since they are typically interpreted by daemons, which typically don't possess locales. > Relatively rare. Like, um, email, news, html, Unix config files, > Windows ini files, source code in just about every language ever, > SMSes, XML, JSON, YAML, instant messenger apps, I would be especially wary of letting Python 3 interpret those files for me. Python's [text] strings could be a wonderful tool on the inside of my program, but I definitely would like to micromanage the I/O. Do I obey the locale or not? That's too big (and painful) a question for Python to answer on its own (and pretend like everything's under control). > word processors... even *graphic* applications invariably have a text > tool. Thing is, the serious text utilities like word processors probably need lots of ancillary information so Python's [text] strings might be too naive to represent even a single character. >> More often, len(b'λ') is what I want. > > Oh really? Are you sure? What exactly is b'λ'? That's something that ought to work in the UTF-8 paradise. Unfortunately, Python only allows ASCII in bytes. ASCII only! In this day and age! Even C is not so picky: #include <stdio.h> int main() { printf("Hyvää yötä\n"); return 0; } Marko -- https://mail.python.org/mailman/listinfo/python-list