On Feb 23, 11:46 am, Joshua Judson Rosen <roz...@geekspace.com> wrote: > Denis Kasak <denis.ka...@gmail.com> writes: > > > > > Python "assumes" ASCII and if the decodes/encoded text doesn't > > > > fit that encoding it refuses to guess. > > > > Which is reasonable given that Python is programming language where it's > > > better to have more conservative assumption about encodings so errors > > > can be more quickly diagnosed. A newsreader however is a different > > > beast, where it's better to make a less conservative assumption that's > > > more likely to display messages correctly to the user. Assuming ISO > > > 8859-1 in the absense of any specified encoding allows the message to be > > > correctly displayed if the character set is either ISO 8859-1 or ASCII. > > > Doing things the "pythonic" way and assuming ASCII only allows such > > > messages to be displayed if ASCII is used. > > > Reading this paragraph, I've began thinking that we've misunderstood > > each other. I agree that assuming ISO 8859-1 in the absence of > > specification is a better guess than most (since it's more likely to > > display the message correctly). > > So, yeah--back on the subject of programming in Python and supporting > charactersets beyond ASCII: > > If you have to make an assumption, I'd really think that it'd be > better to use whatever the host OS's default is, if the host OS has > such a thing--using an assumption of ISO 8859-1 works only in select > regions on unix systems, and may fail even in those select regions on > Windows, Mac OS, and other systems; without the OS considerations, > just the regional constraints are likely to make an ISO-8859-1 > assumption result in /incorrect/ results anywhere eastward of central > Europe. Is a user in Russia (or China, or Japan) *really* most likely > to be using ISO 8859-1? > > As a point of reference, here's what's in the man-pages that I have > installed (note the /complete/ and conspicuous lack of references to > even some notable eastern languages or character-sets, such as Chinese > and Japanese, in the /entire/ ISO-8859 spectrum):
1. As a point of reference for what? 2. The ISO 8859 character sets were deliberately restricted to scripts that would fit in 8 bits. So Chinese, Japanese, Korean and Vietnamese aren't included. Note that Chinese and Japanese already each had *multiple* legacy (i.e. non-Unicode) character sets ... they (and the rest the world) don't want/need yet another character set for each language and never did want/need one. -- http://mail.python.org/mailman/listinfo/python-list