On Sun, 12 Feb 2012 12:28:30 +1100, Chris Angelico wrote: > On Sun, Feb 12, 2012 at 12:21 PM, Eric Snow > <ericsnowcurren...@gmail.com> wrote: >> However, in at >> least one current thread (on python-ideas) and at a variety of times in >> the past, _some_ people have found Unicode in Python 3 to make more >> work. > > If Unicode in Python is causing you more work, isn't it most likely that > the issue would have come up anyway?
The argument being made is that in Python 2, if you try to read a file that contains Unicode characters encoded with some unknown codec, you don't have to think about it. Sure, you get moji-bake rubbish in your database, but that's the fault of people who insist on not being American. Or who spell Zoe with an umlaut. In Python 3, if you try the same thing, you get an error. Fixing the error requires thought, and even if that is only a minuscule amount of thought, that's too much for some developers who are scared of Unicode. Hence the FUD that Python 3 is too hard because it makes you learn Unicode. I know this isn't exactly helpful, but I wish they'd just HTFU. I'm with Joel Spolsky on this one: if you're a programmer in 2003 who doesn't have at least a basic working knowledge of Unicode, you're the equivalent of a doctor who doesn't believe in germs. http://www.joelonsoftware.com/articles/Unicode.html Learning a basic working knowledge of Unicode is not that hard. You don't need to be an expert, and it's just not that scary. The use-case given is: "I have a file containing text. I can open it in an editor and see it's nearly all ASCII text, except for a few weird and bizarre characters like £ © ± or ö. In Python 2, I can read that file fine. In Python 3 I get an error. What should I do that requires no thought?" Obvious answers: - Try decoding with UTF8 or Latin1. Even if you don't get the right characters, you'll get *something*. - Use open(filename, encoding='ascii', errors='surrogateescape') (Or possibly errors='ignore'.) -- Steven -- http://mail.python.org/mailman/listinfo/python-list