Re: error when printing a UTF-8 string (python 2.6.2)

Peter Otten Wed, 21 Apr 2010 01:47:57 -0700

[email protected] wrote:

> Hello.
> 
> I read a string from an utf-8 file:
> 
> fichierLaTeX = codecs.open(sys.argv[1], "r", "utf-8")
> s = fichierLaTeX.read()
> fichierLaTeX.close()
> 
> I can then print the string without error with 'print s'.
> 
> Next I parse this string:
> 
> def parser(s):
>   i = 0
>   while i < len(s):
>     if s[i:i+1] == '\\':
>        i += 1
>        if s[i:i+1] == '\\':
>          print "backslash"
>        elif s[i:i+1] == '%':
>     print "pourcentage"
>        else:
>           if estUnCaractere(s[i:i+1]):
> motcle = ""
> while estUnCaractere(s[i:i+1]):
> motcle += s[i:i+1]
> i += 1
>        print "mot-clé '"+motcle+"'"
> 
> but when I run this code, I get this error:
> 
> Traceback (most recent call last):
>   File "./versOO.py", line 115, in <module>
>       parser(s)
>         File "./versOO.py", line 105, in parser
> print "mot-clé '"+motcle+"'"
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in
> position 6: ordinal not in range(128)
> 
> What must I do to solve this?


>>> "mot-clé" + "mot-clé"
'mot-cl\xc3\xa9mot-cl\xc3\xa9'

>>> u"mot-clé" + u"mot-clé"
u'mot-cl\xe9mot-cl\xe9'

>>> "mot-clé" + u"mot-clé"
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 6: 
ordinal not in range(128)

codecs.open().read() returns unicode, but your literals are all bytestrings.
When you are mixing unicode and str Python tries to convert the bytestring 
to unicode using the ascii codec, and of course fails for non-ascii 
characters.

Change your string literals to unicode by adding the u-prefix and you should 
be OK.

Peter
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: error when printing a UTF-8 string (python 2.6.2)

Reply via email to