[issue5828] Invalid behavior of unicode.lower

2009-04-25 Thread Martin v. Löwis
Martin v. Löwis added the comment: > BTW, are the steps to regenerate the Unicode database documented > somewhere? I don't think so - your procedure looks right, though. Regenerating the database is often more difficult, though, in particular when we upgrade to a new version. Often, the new v

[issue5828] Invalid behavior of unicode.lower

2009-04-25 Thread Walter Dörwald
Changes by Walter Dörwald : -- assignee: doerwalter -> loewis ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: ht

[issue5828] Invalid behavior of unicode.lower

2009-04-25 Thread Walter Dörwald
Walter Dörwald added the comment: BTW, are the steps to regenerate the Unicode database documented somewhere? What I did was: cp /Volumes/ftp.unicode.org/Public/5.1.0/ucd/UnicodeData.txt . cp /Volumes/ftp.unicode.org/Public/5.1.0/ucd/CompositionExclusions.txt . cp /Volumes/ftp.unicode.org/Publi

[issue5828] Invalid behavior of unicode.lower

2009-04-25 Thread Walter Dörwald
Walter Dörwald added the comment: Checked in: r71896 (py3k) r71897 (release30-maint) -- resolution: -> fixed status: open -> closed ___ Python tracker ___ __

[issue5828] Invalid behavior of unicode.lower

2009-04-25 Thread Walter Dörwald
Walter Dörwald added the comment: Checked in: r71894 (trunk) r71895 (release26-maint) -- ___ Python tracker ___ ___ Python-bugs-list m

[issue5828] Invalid behavior of unicode.lower

2009-04-25 Thread Martin v. Löwis
Martin v. Löwis added the comment: Feel free to check it into trunk, and merge into the other three branches from there. If you don't want to do that, assign it back to me. -- assignee: loewis -> doerwalter ___ Python tracker

[issue5828] Invalid behavior of unicode.lower

2009-04-25 Thread Walter Dörwald
Walter Dörwald added the comment: I've merged your version of the patch with my changes to the test suite and regenerated the Unicode database. Attached is the resulting patch (diff4.txt) -- Added file: http://bugs.python.org/file13768/diff4.txt ___

[issue5828] Invalid behavior of unicode.lower

2009-04-25 Thread Martin v. Löwis
Martin v. Löwis added the comment: I think the patch is incorrect; the bug is already in makeunicodedata.py. For U+1d79, it should set the lowercase letter to U+1d79. If you look at makeunicodedata.py, you see that the entire logic is bogus: when the column is absent, it should default it to th

[issue5828] Invalid behavior of unicode.lower

2009-04-25 Thread Walter Dörwald
Walter Dörwald added the comment: Here is a third version of the patch. AFAICT the logic of the unicode database is as follows: * If the NODELTA_MASK is not set, delta is an offset. * If NODELTA_MASK is set and delta is != 0, delta is the upper/lower/title case character. * If NODELTA_MASK is s

[issue5828] Invalid behavior of unicode.lower

2009-04-24 Thread Terry J. Reedy
Terry J. Reedy added the comment: Py3.0.1 >>> '\u1d79'.lower() '\x00' I am guessing that this bug is in 2.7 and 3.1 as well. -- nosy: +tjreedy versions: +Python 2.7, Python 3.0, Python 3.1 ___ Python tracker _

[issue5828] Invalid behavior of unicode.lower

2009-04-24 Thread Walter Dörwald
Walter Dörwald added the comment: Updated the patch (diff2.txt) as requested by Amaury. -- Added file: http://bugs.python.org/file13759/diff2.txt ___ Python tracker ___ _

[issue5828] Invalid behavior of unicode.lower

2009-04-24 Thread Amaury Forgeot d'Arc
Amaury Forgeot d'Arc added the comment: The same change should be applied to _PyUnicode_ToTitlecase as well. -- nosy: +amaury.forgeotdarc ___ Python tracker ___ _

[issue5828] Invalid behavior of unicode.lower

2009-04-24 Thread Walter Dörwald
Walter Dörwald added the comment: The following patch fixes the problem for me, however it breaks the test suite. The change seems to have been introduced in r66362. Assigning to Martin. -- assignee: -> loewis nosy: +loewis stage: -> patch review Added file: http://bugs.python.org/fi

[issue5828] Invalid behavior of unicode.lower

2009-04-24 Thread Walter Dörwald
Walter Dörwald added the comment: It *does* return u'\u1d79' for me on Python 2.5.2: >>> u'\u1d79'.lower() u'\u1d79' >>> import sys >>> sys.version '2.5.2 (r252:60911, Apr 8 2008, 18:54:00) \n[GCC 3.3.5 (Debian 1:3.3.5-13)]' However on 2.6.2 it's broken: >>> u'\u1d79'.lower() u'\x00' >>> imp

[issue5828] Invalid behavior of unicode.lower

2009-04-24 Thread Jarek Sobieszek
New submission from Jarek Sobieszek : u'\u1d79'.lower() returns u'\x00' I think it should return u'\u1d79', at least according to my understanding of UnicodeData.txt (the lowercase field is empty). -- components: Unicode messages: 86400 nosy: jarek severity: normal status: open title: I