>>>>> u'\xb5' >> u'\xb5' >>>>> print u'\xb5' >> � > > Unicode literals are *in the source file*, which can only have one > encoding (for a given source file). > >> (That last character shows up as a micron sign despite the fact that >> my default encoding is ascii, so it seems to me that that unicode >> string must somehow have picked up a latin-1 encoding.) > > I think latin-1 was the default without a coding cookie line. (May be > uft-8 in 3.0).
It is, but that's irrelevant for the example. In the source u'\xb5' all characters are ASCII (i.e. all of "letter u", "single quote", "backslash", "letter x", "letter b", "digit 5"). As a consequence, this source text has the same meaning in all supported source encodings (as source encodings must be ASCII supersets). The Unicode literal shown here does not get its interpretation from Latin-1. Instead, it directly gets its interpretation from the Unicode coded character set. The string is a short-hand for u'\u00b5' and this denotes character U+00B5 (just as u'\u20ac" denotes U+20AC; the same holds for any other u'\uXXXX'). HTH, Martin -- http://mail.python.org/mailman/listinfo/python-list