[issue12204] str.upper converts to title

Ezio Melotti Sun, 29 May 2011 01:05:23 -0700

Ezio Melotti <ezio.melo...@gmail.com> added the comment:

'\u1ff3'.upper() returns '\u1ffc', so we have:
  U+1FF3 (ῳ - GREEK SMALL LETTER OMEGA WITH YPOGEGRAMMENI)
  U+1FFC (ῼ - GREEK CAPITAL LETTER OMEGA WITH PROSGEGRAMMENI)
The first belongs to the Ll (Letter, lowercase) category, whereas the second 
belongs to the Lt (Letter, titlecase) category.


The entries for these two chars in the UnicodeData.txt[0] files are:
1FF3;GREEK SMALL LETTER OMEGA WITH YPOGEGRAMMENI;Ll;0;L;03C9 
0345;;;;N;;;1FFC;;1FFC
1FFC;GREEK CAPITAL LETTER OMEGA WITH PROSGEGRAMMENI;Lt;0;L;03A9 
0345;;;;N;;;;1FF3;

U+1FF3 has U+1FFC in both the third last and last field 
(Simple_Uppercase_Mapping and Simple_Titlecase_Mapping respectively -- see 
[1]), so .upper() is doing the right thing here.
U+1FFC has U+1FF3 in the second last field (Simple_Lowercase_Mapping), but 
since it's category is not Lu, but Lt, .isupper() returns False.

The Unicode Standard Annex #44[2] defines the Lt category as:
  Lt  Titlecase_Letter  a digraphic character, with first part uppercase

I'm not sure there's anything to fix here, both function behave as documented, 
and it might indeed be the case that .upper() returns chars with category Lt, 
that then return False with .isupper()

[0]: http://unicode.org/Public/UNIDATA/UnicodeData.txt
[1]: http://www.unicode.org/reports/tr44/#UnicodeData.txt
[2]: http://www.unicode.org/reports/tr44/#GC_Values_Table

----------
components: +Interpreter Core, Unicode -None
nosy: +belopolsky, ezio.melotti
versions: +Python 3.2, Python 3.3

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue12204>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue12204] str.upper converts to title

Reply via email to