"Jerry Hill" <[EMAIL PROTECTED]> wrote in message
news:[EMAIL PROTECTED]
On Mon, Jul 14, 2008 at 12:40 PM, Tim Cook <[EMAIL PROTECTED]>
wrote:
if I say units=unicode("°"). I get
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 0:
ordinal not in range(128)
If I try x=unicode.decode(x,'utf-8'). I get
TypeError: descriptor 'decode' requires a 'unicode' object but received
a 'str'
What is the correct way to interpret these symbols that come to me as a
string?
Part of it depends on where you're getting them from. If they are in
your source code, just define them like this:
units = u"°"
print units
°
print repr(units)
u'\xb0'
If they're coming from an external source, you have to know the
encoding they're being sent in. Then you can decode them into
unicode, like this:
units = "°"
unicode_units = units.decode('Latin-1')
print repr(unicode_units)
u'\xb0'
print unicode_units
°
--
Jerry
Even with source code you have to know the encoding. for pre-3.x, Python
defaults to ascii encoding for source files:
test.py contains:
units = u"°"
import test
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "test.py", line 1
SyntaxError: Non-ASCII character '\xb0' in file test.py on line 1, but no
encoding declared; see http://www.python.org/peps/pep-0263.html for details
The encoding of the source file can be declared:
# coding: latin-1
units = u"°"
import test
test.units
u'\xb0'
print test.units
°
Make sure to use the correct encoding! Here the file was saved in latin-1,
but declared utf8:
# coding: utf8
units = u"°"
import test
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf8' codec can't decode byte 0xb0 in position 0:
unexpected code byte
--
Mark
--
http://mail.python.org/mailman/listinfo/python-list