Martin v. Löwis <mar...@v.loewis.de> added the comment:

> Should not the Danish letter "Ø" be normalized as "O"? I get "Ø" for all 
> NFC/NFD/NFKC/NFKD 
> normalizations?

I think you have a fundamental misunderstanding what a "decomposition"
is. "Ø" should *not* be decomposed as "O", because clearly, "Ø" and "O"
are different letters. If anything, it would be decomposed as
"O" + PLUS SOME COMBINING MARK

Now, in the specific case of

00D8;LATIN CAPITAL LETTER O WITH STROKE;Lu;0;L;;;;;N;LATIN CAPITAL
LETTER O SLASH;;;00F8;

no canonical decomposition is specified. Compare this to

00D5;LATIN CAPITAL LETTER O WITH TILDE;Lu;0;L;004F 0303;;;;N;LATIN
CAPITAL LETTER O TILDE;;;00F5;

which decomposes to U+004F followed by U+0303, i.e.
LATIN CAPITAL LETTER O followed by COMBINING TILDE.

If "Ø" was to be decomposed, it should use a mark COMBINING STROKE,
but no such combining mark exists in Unicode. I don't know why that
is; you would have to ask the Unicode consortium. In any case, Unicode
guarantees stability wrt. decompositions, so even if some combining
mark gets added later on, the existing decomposition remain stable.

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue5200>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to