ael Glavassevich [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, August 05, 2008 6:00 PM
> To: j-users@xerces.apache.org
> Subject: Re: single non-BMP character counted as two characters
>
>
>
> Hi Taki,
>
> It's a long standing bug/limitation. Xerces uses String.length(
ch is probably partly because the use of length is not
very common in practice.
Thanks!
-taki
From: Michael Glavassevich [mailto:[EMAIL PROTECTED]
Sent: Tuesday, August 05, 2008 6:00 PM
To: j-users@xerces.apache.org
Subject: Re: single non-BMP character counted a
That's essentially what's happening in the Harmony code base. The code
essentially delegates to Character.codePointCount(CharSequence,int,int),
which loops over the chars looking for high surrogates. This could certainly
be optimized though.
-Nathan
On Tue, Aug 5, 2008 at 8:44 PM, Michael Glavass
Hi Nathan,
Is the implementation of that method any better than iterating over the
string and counting the number of code points? I think the last time I
noticed this bug in the code I resisted fixing it because of the negative
performance impact on the majority of input which only contains charac
This might be an additional impetus to move the code base for future
development to Java 5 libraries, so things like String.codePointCount can be
used.
-Nathan
On Tue, Aug 5, 2008 at 7:59 PM, Michael Glavassevich <[EMAIL PROTECTED]>wrote:
> Hi Taki,
>
> It's a long standing bug/limitation. Xerce
Hi Taki,
It's a long standing bug/limitation. Xerces uses String.length() (which
returns the length of the string in chars rather than Unicode code points)
for checking the length facet.
Thanks.
Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: [EMAIL PROTECTED]
E-mail: [EMAIL
Hi,
The following schema, which is supposedly valid, results in this error:
cvc-length-valid: Value '𠀋' with length = '2' is not facet-valid with respect
to length '1'
for type '#AnonType_act'.
The default value "𠀋" for attribute "a" is a single non-BMP character.
It is as though a surroga