Dennis Lee Bieber wrote: > On 12 Dec 2006 23:40:41 -0800, "kernel1983" <[EMAIL PROTECTED]> > declaimed the following in gmane.comp.python.general: > > > and I tried unicode and utf-8 > > I tried to both use unicode&utf-8 head just like "\xEF\xBB\xBF" and not > > to use > > > "unicode" is a term covering many sins. "utf-8" is a specification > for encoding elements of specific unicode characters using 8-bit > elements (I believe by using certain codes x00 to x7F alone as "normal", > and then x80 to xFF to represent an "escape" to higher [16-bit] element > sets). > > "\xEF\xBB\xBF" is just a byte string with no identifier of what > encoding is in use (unless the first one or two are supposed to be > BOM)... In the "Windows: Western" character set, it is equivalent to > small-i-diaeresis/right-guillemot/upside-down? () In MS-DOS: Western > Europe, those same bytes represent an > acute-accent/double-down&left-box-drawing/solid-down&left > > I've not done any unicode work (iso-latin-1, or subset thereof, has > done for me). I also don't know Mac's, so I don't know if the windowing > API has specific calls for Unicode data... But you probably have to > encode or decod that bytestring into some compatible unicode > representation. > When you save a textfile as UTF-8 in Notepad.exe (Windows) it puts the bytestring "\xEF\xBB\xBF" at the start to indicate that it's UTF-8 and not ANSI (ie 8-bit characters). The bytes are actually the BOM bytestring "\xFE\xFF" encoded in UTF-8.
-- http://mail.python.org/mailman/listinfo/python-list