Martin v. Löwis <mar...@v.loewis.de> added the comment: I do think this is a bug in the Unicode database. The current approach (of falling back to uppercase if there is no title case in the Unicode database) goes back to r17708. However, even the prior version only contained explicitly the cases where a titlecase was specified and different from the uppercase.
I think part of the motivation is this note from http://www.unicode.org/Public/UNIDATA/UCD.html Note: The simple titlecase may be omitted in the data file if the titlecase is the same as the uppercase. (notice that for uppercase, it says instead "The simple uppercase is omitted in the data file if the uppercase is the same as the code point itself", likewise for lowercase) Considering this note, the simple titlecase of U+01C5 *is* U+01C4: the titlecase value is omitted, hence it is the same as uppercase, hence it is U+01C4. Most likely, the algorithm to produce the database was different from the documented algorithm, and it is a bug in UCD.html. However, if UCD.html is correct, it is likely a bug in UnicodeData.txt. ---------- nosy: +loewis _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue4971> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com