John Machin <sjmac...@users.sourceforge.net> added the comment:

Martin:"""Considering this note, the simple titlecase of U+01C5 *is*
U+01C4: the titlecase value is omitted, hence it is the same as
uppercase, hence it is U+01C4."""

Perhaps we are looking at different files; in the Unicode 5.1
UnicodeData.txt that I downloaded
(http://www.unicode.org/Public/UNIDATA/UnicodeData.txt), the title field
for U+01C5 is *NOT* omitted, it is set to 01C5. AFAICT the intention is
that the four characters in question are their own titlecase, which is
not altogether unexpected given their visual representation.

Here's the record for U+01C5:
01C5;LATIN CAPITAL LETTER D WITH SMALL LETTER Z WITH
CARON;Lt;0;L;<compat> 0044 017E;;;;N;LATIN LETTER CAPITAL D SMALL Z
HACEK;;01C4;01C6;01C5

The note (which I hadn't noticed and explains the mention of
ctype->upper in the _PyUnicode_ToTitlecase function) says that the
titlecase value may be omitted if it is the same as the uppercase. FWIW
there are *no* examples in the current (5.1) file where the title field
is empty and the upper field is not empty. 

ISTM the problem is that implementing the default-to-uppercase was not
done in Tools/unicode/makeunicodedata.py where full information is
available. This left no way in _PyUnicode_ToTitlecase of resolving the
ambiguity of a zero value for ctype->title -- is it "no titlecase
supplied so use uppercase" or is it "titlecase supplied, delta == 0,
means ch.title() -> ch"?

----------
nosy: +sjmachin

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue4971>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to