Martin v. Löwis <mar...@v.loewis.de> added the comment: > Should not the Danish letter "Ø" be normalized as "O"? I get "Ø" for all > NFC/NFD/NFKC/NFKD > normalizations?
I think you have a fundamental misunderstanding what a "decomposition" is. "Ø" should *not* be decomposed as "O", because clearly, "Ø" and "O" are different letters. If anything, it would be decomposed as "O" + PLUS SOME COMBINING MARK Now, in the specific case of 00D8;LATIN CAPITAL LETTER O WITH STROKE;Lu;0;L;;;;;N;LATIN CAPITAL LETTER O SLASH;;;00F8; no canonical decomposition is specified. Compare this to 00D5;LATIN CAPITAL LETTER O WITH TILDE;Lu;0;L;004F 0303;;;;N;LATIN CAPITAL LETTER O TILDE;;;00F5; which decomposes to U+004F followed by U+0303, i.e. LATIN CAPITAL LETTER O followed by COMBINING TILDE. If "Ø" was to be decomposed, it should use a mark COMBINING STROKE, but no such combining mark exists in Unicode. I don't know why that is; you would have to ask the Unicode consortium. In any case, Unicode guarantees stability wrt. decompositions, so even if some combining mark gets added later on, the existing decomposition remain stable. _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue5200> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com