Re: u32_normalize UNINORM_NFKC on 0xD800

2011-05-27 Thread Simon Josefsson
Bruno Haible writes: > Simon Josefsson wrote: >> I'm calculating this IDNA2008 property >> >>toNFKC(toCaseFold(toNFKC(cp))) != cp >> >> for all code points. > > It makes no sense to consider non-character code points here. Citing again > the Unicode standard, chapter 3 [1], section 3.8: > >

Re: u32_normalize UNINORM_NFKC on 0xD800

2011-05-27 Thread Bruno Haible
Simon Josefsson wrote: > I'm calculating this IDNA2008 property > >toNFKC(toCaseFold(toNFKC(cp))) != cp > > for all code points. It makes no sense to consider non-character code points here. Citing again the Unicode standard, chapter 3 [1], section 3.8: "High-surrogate and low-surrogate c

Re: u32_normalize UNINORM_NFKC on 0xD800

2011-05-27 Thread Simon Josefsson
FWIW, I came up with a better approach to handle this, and have asked for confirmation of the interpretation on the IDNABIS list. So I think u32_normalize is fine, as you explained. http://www.alvestrand.no/pipermail/idna-update/2011-May/007099.html /Simon

Re: u32_normalize UNINORM_NFKC on 0xD800

2011-05-27 Thread Simon Josefsson
Bruno Haible writes: > Simon Josefsson wrote: >> I'm doing some Unicode NFKC operations and noticing that u32_normalize >> fails for U+D800. > > This is a valid behaviour, because U+D800 is a "surrogate" point code > and therefore not a valid character code point. > > See the Unicode standard, ch

Re: u32_normalize UNINORM_NFKC on 0xD800

2011-05-26 Thread Bruno Haible
Simon Josefsson wrote: > I'm doing some Unicode NFKC operations and noticing that u32_normalize > fails for U+D800. This is a valid behaviour, because U+D800 is a "surrogate" point code and therefore not a valid character code point. See the Unicode standard, chapter 2 [1], pages 23..24: Surrogat

u32_normalize UNINORM_NFKC on 0xD800

2011-05-26 Thread Simon Josefsson
I'm doing some Unicode NFKC operations and noticing that u32_normalize fails for U+D800. Is this behaviour permitted by TR15? I thought toNFKC should succeed for all code points. /Simon