Bugs item #1772788, was opened at 2007-08-13 01:54 Message generated for change (Comment added) made by laukpe You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1772788&group_id=5470
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Pekka Laukkanen (laukpe) Assigned to: Nobody/Anonymous (nobody) Summary: chr(128) in u'only ascii' -> TypeError with misleading msg Initial Comment: A test using in format "chr(x) in <string>" raises a TypeError if "x" is in range 128-255 (i.e. non-ascii) and string is unicode. This happens even if the unicode string contains only ascii data as the example below demonstrates. Python 2.5.1 (r251:54863, May 2 2007, 16:56:35) [GCC 4.1.2 (Ubuntu 4.1.2-0ubuntu4)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> chr(127) in 'hello' False >>> chr(128) in 'hello' False >>> chr(127) in u'hi' False >>> chr(128) in u'hi' Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: 'in <string>' requires string as left operand This can cause pretty nasty and hard-to-debug bugs in code using "in <string>" format if e.g. user provided data is converted to unicode internally. Most other string operations work nicely between normal and unicode strings and I'd say simply returning False in this situation would be ok too. Issuing a warning similarly as below might be a good idea also. >>> chr(128) == u'' __main__:1: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal Finally, the error message is somewhat misleading since the left operand is definitely a string. >>> type(chr(128)) <type 'str'> A real life example of code where this problem exist is telnetlib. I'll submit a separate bug about it as that problem can obviously be fixed in the library itself. ---------------------------------------------------------------------- >Comment By: Pekka Laukkanen (laukpe) Date: 2007-08-21 17:03 Message: Logged In: YES user_id=1379331 Originator: YES Fredrik, you are obviously correct that most operations between normal and unicode strings don't work if the normal string contains non-ascii data. I still do think that a UnicodeWarning like you get from "chr(128) == u'foo'" would be nicer than an exception and prevent problems like the one in telnetlib [1]. If an exception is raised I don't care too much about its type but a better message would make debugging possible problems easier. [1] https://sourceforge.net/tracker/index.php?func=detail&aid=1772794&group_id=5470&atid=105470 ---------------------------------------------------------------------- Comment By: Fredrik Lundh (effbot) Date: 2007-08-21 11:48 Message: Logged In: YES user_id=38376 Originator: NO "Most other string operations work nicely between normal and unicode strings" Nope. You *always* get errors if you mix Unicode with NON-ASCII data (unless you've messed up the system's default encoding, which is a bad thing to do if you care about portability). Some examples: >>> chr(128) + u"foo" UnicodeDecodeError: 'ascii' codec can't decode byte 0x80 in position 0: ordinal not in range(128) >>> u"foo".find(chr(128)) UnicodeDecodeError: 'ascii' codec can't decode byte 0x80 in position 0: ordinal not in range(128) etc. If there's a bug here, it's that you get a TypeError instead of a ValueError subclass. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1772788&group_id=5470 _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com