On Sun, Feb 12, 2012 at 1:36 PM, Rick Johnson <rantingrickjohn...@gmail.com> wrote: > On Feb 11, 8:23 pm, Steven D'Aprano <steve > +comp.lang.pyt...@pearwood.info> wrote: >> "I have a file containing text. I can open it in an editor and see it's >> nearly all ASCII text, except for a few weird and bizarre characters like >> £ © ± or ö. In Python 2, I can read that file fine. In Python 3 I get an >> error. What should I do that requires no thought?" >> >> Obvious answers: > > the most obvious answer would be to read the file WITHOUT worrying > about asinine encoding.
What this statement misunderstands, though, is that ASCII is itself an encoding. Files contain bytes, and it's only what's external to those bytes that gives them meaning. The famous "bush hid the facts" trick with Windows Notepad shows the folly of trying to use internal evidence to identify meaning from bytes. Everything that displays text to a human needs to translate bytes into glyphs, and the usual way to do this conceptually is to go via characters. Pretending that it's all the same thing really means pretending that one byte represents one character and that each character is depicted by one glyph. And that's doomed to failure, unless everyone speaks English with no foreign symbols - so, no mathematical notations. ChrisA -- http://mail.python.org/mailman/listinfo/python-list