Re: doubt about utf8 and charactrers method in DefaultHandler (SaxParser)

Michael Glavassevich Thu, 23 Apr 2009 16:23:04 -0700

Hi Raimon,

Raimon Bosch <raimon.bo...@gmail.com> wrote on 04/23/2009 06:59:42 PM:


> I see that characters method is always interpreting the characters as
16-bit
> characters, because is an array of type char. How Xerces manage the
> non-16-bit characters? For example, in UTF8 there is a lot of characters
> between 16 and 32 bits.
>
> If I found a char outside the 16 bit UTF-8 range, can I suppose that it
is
> not an UTF-8 character?

UTF-8 and UTF-16 are character encodings [1], representing the characters
defined by Unicode as sequences of bytes. These encodings have a
representation for every character in Unicode. Like any of the other
encodings they're decoded into Java chars on input so it's all the same to
the consumer of the SAX API regardless of what the document's encoding was.

Thanks.

[1] http://en.wikipedia.org/wiki/Character_encoding

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrgla...@ca.ibm.com
E-mail: mrgla...@apache.org

Re: doubt about utf8 and charactrers method in DefaultHandler (SaxParser)

Reply via email to