On 6/27/06, Dennis Benzinger <[EMAIL PROTECTED]> wrote: > Hi! > > The following program in an UTF-8 encoded file: > > > # -*- coding: UTF-8 -*- > > FIELDS = ("Fächer", ) > FROZEN_FIELDS = frozenset(FIELDS) > FIELDS_SET = set(FIELDS) > > print u"Fächer" in FROZEN_FIELDS > print u"Fächer" in FIELDS_SET > print u"Fächer" in FIELDS > > > gives this output > > > False > False > Traceback (most recent call last): > File "test.py", line 9, in ? > print u"FÀcher" in FIELDS > UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1: > ordinal not in range(128) > > > Why do the first two print statements succeed and the third one fails > with an exception?
Actually all three statements fail to produce correct result. > Why does the use of set/frozenset remove the exception? Because sets use hash algorithm to find matches, whereas the last statement directly compares a unicode string with a byte string. Byte strings can only contain ascii characters, that's why python raises an exception. The problem is very easy to fix: use unicode strings for all non-ascii strings. -- http://mail.python.org/mailman/listinfo/python-list