"Jerry Hill" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED]
On Mon, Jul 14, 2008 at 12:40 PM, Tim Cook <[EMAIL PROTECTED]> wrote:
if I say units=unicode("°").  I get
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 0:
ordinal not in range(128)

If I try x=unicode.decode(x,'utf-8'). I get
TypeError: descriptor 'decode' requires a 'unicode' object but received
a 'str'

What is the correct way to interpret these symbols that come to me as a
string?

Part of it depends on where you're getting them from.  If they are in
your source code, just define them like this:

units = u"°"
print units
°
print repr(units)
u'\xb0'

If they're coming from an external source, you have to know the
encoding they're being sent in.  Then you can decode them into
unicode, like this:

units = "°"
unicode_units = units.decode('Latin-1')
print repr(unicode_units)
u'\xb0'
print unicode_units
°

--
Jerry


Even with source code you have to know the encoding. for pre-3.x, Python defaults to ascii encoding for source files:

test.py contains:
units = u"°"

import test
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
 File "test.py", line 1
SyntaxError: Non-ASCII character '\xb0' in file test.py on line 1, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details

The encoding of the source file can be declared:

# coding: latin-1
units = u"°"

import test
test.units
u'\xb0'
print test.units
°

Make sure to use the correct encoding! Here the file was saved in latin-1, but declared utf8:

# coding: utf8
units = u"°"

import test
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf8' codec can't decode byte 0xb0 in position 0: unexpected code byte


--
Mark
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to