Serge Orlov wrote: > On 6/27/06, Dennis Benzinger <[EMAIL PROTECTED]> wrote: >> Hi! >> >> The following program in an UTF-8 encoded file: >> >> >> # -*- coding: UTF-8 -*- >> >> FIELDS = ("Fächer", ) >> FROZEN_FIELDS = frozenset(FIELDS) >> FIELDS_SET = set(FIELDS) >> >> print u"Fächer" in FROZEN_FIELDS >> print u"Fächer" in FIELDS_SET >> print u"Fächer" in FIELDS >> >> >> gives this output >> >> >> False >> False >> Traceback (most recent call last): >> File "test.py", line 9, in ? >> print u"FÀcher" in FIELDS >> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1: >> ordinal not in range(128) >> >> >> Why do the first two print statements succeed and the third one fails >> with an exception? > > Actually all three statements fail to produce correct result.
So this is a bug in Python? > frozenset remove the exception? > > Because sets use hash algorithm to find matches, whereas the last > statement directly compares a unicode string with a byte string. Byte > strings can only contain ascii characters, that's why python raises an > exception. The problem is very easy to fix: use unicode strings for > all non-ascii strings. No, byte strings contain characters which are at least 8-bit wide <http://docs.python.org/ref/types.html>. But I don't understand what Python is trying to decode and why the exception says something about the ASCII codec, because my file is encoded with UTF-8. Dennis -- http://mail.python.org/mailman/listinfo/python-list