Bugs item #1331062, was opened at 2005-10-19 08:23 Message generated for change (Comment added) made by titty You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1331062&group_id=5470
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Unicode Group: Python 2.4 Status: Open Resolution: None Priority: 5 Submitted By: Ralf Schmitt (titty) Assigned to: M.-A. Lemburg (lemburg) Summary: utf 7 codec broken Initial Comment: the following code doesn't work as expected: [EMAIL PROTECTED]:~$ cat t.py #! /usr/bin/env python s = 'Auguste and Louis Lumi\xe8re' print repr(s) u1 = s.decode('utf7') print 'from utf7: %d %r' % (len(u1), u1) u2 = u'Auguste and Louis Lumi\xe8re' print ' u2: %d %r' % (len(u2), u2) print 'u1==u2', u1==u2 e1 = u1.encode('utf8') e2 = u2.encode('utf8') print 'e1=%r' % e1 print 'e2=%r' % e2 unicode(e2, 'utf8') unicode(e1, 'utf8') [EMAIL PROTECTED]:~$ python t.py 'Auguste and Louis Lumi\xe8re' from utf7: 25 u'Auguste and Louis Lumi\xe8re' u2: 25 u'Auguste and Louis Lumi\xe8re' u1==u2 False e1='Auguste and Louis Lumi\xff\xbf\xbf\xa8re' e2='Auguste and Louis Lumi\xc3\xa8re' Traceback (most recent call last): File "t.py", line 19, in ? unicode(e1, 'utf8') File "/usr/local/lib/python2.4/encodings/utf_8.py", line 16, in decode return codecs.utf_8_decode(input, errors, True) UnicodeDecodeError: 'utf8' codec can't decode byte 0xff in position 22: unexpected code byte ---------------------------------------------------------------------- >Comment By: Ralf Schmitt (titty) Date: 2005-10-19 11:29 Message: Logged In: YES user_id=17929 The problem *disappears* on freebsd if I configure *without* --enable-unicode=ucs4. Guess this is also what the debian people are using and not a compiler bug, since freebsd uses gcc 2.95 and debian 4.0.x. ---------------------------------------------------------------------- Comment By: Sjoerd Mullender (sjoerd) Date: 2005-10-19 11:17 Message: Logged In: YES user_id=43607 The definition of SPECIAL in unicodeobject.c is wrong. It tests a character for > 127, but when characters are signed and Py_UNICODE expands to a signed type, this doesn't do what was intended. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2005-10-19 11:07 Message: Logged In: YES user_id=38388 I was testing on SuSE Linux 9.2. Sounds like a compiler bug. Could you try compiling with optimization switched off on FreeBSD ? Thanks. ---------------------------------------------------------------------- Comment By: Ralf Schmitt (titty) Date: 2005-10-19 10:58 Message: Logged In: YES user_id=17929 On Debian testing and Freebsd 4.11 using Python 2.4.2 '\xe8'.decode('utf7') succeeds... Using the windows version I also get that error. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2005-10-19 10:30 Message: Logged In: YES user_id=38388 Hmm, running Python 2.4.2 I get: >>> s = 'Auguste and Louis Lumi\xe8re' >>> print repr(s) 'Auguste and Louis Lumi\xe8re' >>> u1 = s.decode('utf7') Traceback (most recent call last): File "<stdin>", line 1, in ? UnicodeDecodeError: 'utf7' codec can't decode bytes in position 0-22: unexpected special character Which looks correct as UTF-7 may not contain characters having the hig bit set. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1331062&group_id=5470 _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com