John Machin <sjmac...@users.sourceforge.net> added the comment: Martin:"""Considering this note, the simple titlecase of U+01C5 *is* U+01C4: the titlecase value is omitted, hence it is the same as uppercase, hence it is U+01C4."""
Perhaps we are looking at different files; in the Unicode 5.1 UnicodeData.txt that I downloaded (http://www.unicode.org/Public/UNIDATA/UnicodeData.txt), the title field for U+01C5 is *NOT* omitted, it is set to 01C5. AFAICT the intention is that the four characters in question are their own titlecase, which is not altogether unexpected given their visual representation. Here's the record for U+01C5: 01C5;LATIN CAPITAL LETTER D WITH SMALL LETTER Z WITH CARON;Lt;0;L;<compat> 0044 017E;;;;N;LATIN LETTER CAPITAL D SMALL Z HACEK;;01C4;01C6;01C5 The note (which I hadn't noticed and explains the mention of ctype->upper in the _PyUnicode_ToTitlecase function) says that the titlecase value may be omitted if it is the same as the uppercase. FWIW there are *no* examples in the current (5.1) file where the title field is empty and the upper field is not empty. ISTM the problem is that implementing the default-to-uppercase was not done in Tools/unicode/makeunicodedata.py where full information is available. This left no way in _PyUnicode_ToTitlecase of resolving the ambiguity of a zero value for ctype->title -- is it "no titlecase supplied so use uppercase" or is it "titlecase supplied, delta == 0, means ch.title() -> ch"? ---------- nosy: +sjmachin _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue4971> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com