Martin v. Löwis wrote: > John Salerno wrote: >> Robert Kern wrote: >> >>> http://www.joelonsoftware.com/articles/Unicode.html >> >> That was fascinating. Thank you. So as it turns out, Unicode and UTF-8 >> are not the same thing? Am I right to say that UTF-8 stores the first >> 128 Unicode code points in a single byte, and then stores higher code >> points in however many bytes they may need? If so, I guess I had been >> mislead by the '8' in the name, thinking that UTF-8 was another way of >> storing characters in one byte (which would make it no different than >> Latin-1, I suppose). > > That's all correct, except for the last parenthetical remark: using > a single-byte character set isn't the same as using Latin-1. There > are various single-byte characters sets; they have names like Latin-2, > Latin-5, Latin-15, KOI8-R, CP437, windows-1252, and so on. > > Regards, > Martin
Oh, I just meant that Latin-1 was an example of a one-byte character set, right? So UTF-8 would be identical to it if it worked how I used to think it did. -- http://mail.python.org/mailman/listinfo/python-list