f...@slick.airforce-one.org wrote: > Hello. > > I read a string from an utf-8 file: > > fichierLaTeX = codecs.open(sys.argv[1], "r", "utf-8") > s = fichierLaTeX.read() > fichierLaTeX.close() > > I can then print the string without error with 'print s'. > > Next I parse this string: > > def parser(s): > i = 0 > while i < len(s): > if s[i:i+1] == '\\': > i += 1 > if s[i:i+1] == '\\': > print "backslash" > elif s[i:i+1] == '%': > print "pourcentage" > else: > if estUnCaractere(s[i:i+1]): > motcle = "" > while estUnCaractere(s[i:i+1]): > motcle += s[i:i+1] > i += 1 > print "mot-clé '"+motcle+"'" > > but when I run this code, I get this error: > > Traceback (most recent call last): > File "./versOO.py", line 115, in <module> > parser(s) > File "./versOO.py", line 105, in parser > print "mot-clé '"+motcle+"'" > UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in > position 6: ordinal not in range(128) > > What must I do to solve this?
>>> "mot-clé" + "mot-clé" 'mot-cl\xc3\xa9mot-cl\xc3\xa9' >>> u"mot-clé" + u"mot-clé" u'mot-cl\xe9mot-cl\xe9' >>> "mot-clé" + u"mot-clé" Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 6: ordinal not in range(128) codecs.open().read() returns unicode, but your literals are all bytestrings. When you are mixing unicode and str Python tries to convert the bytestring to unicode using the ascii codec, and of course fails for non-ascii characters. Change your string literals to unicode by adding the u-prefix and you should be OK. Peter -- http://mail.python.org/mailman/listinfo/python-list