Re: What encoding does u'...' syntax use?

Terry Reedy Fri, 20 Feb 2009 13:36:41 -0800

Ron Garret wrote:

I would have thought that the answer would be: the default encoding(duh!) But empirically this appears not to be the case:
unicode('\xb5')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xb5 in position 0:ordinal not in range(128)

The unicode function is usually used to decode bytes read from *externalsources*, each of which can have its own encoding. So the function(actually, developer crew) refuses to guess and uses the ascii commonsubset.

u'\xb5'

u'\xb5'

print u'\xb5'

�

Unicode literals are *in the source file*, which can only have oneencoding (for a given source file).

(That last character shows up as a micron sign despite the fact that mydefault encoding is ascii, so it seems to me that that unicode stringmust somehow have picked up a latin-1 encoding.)

I think latin-1 was the default without a coding cookie line. (May beuft-8 in 3.0).


--
http://mail.python.org/mailman/listinfo/python-list

Re: What encoding does u'...' syntax use?

Reply via email to