Re: encoded character references

Daniel Yokomizo Tue, 22 Apr 2008 08:46:10 -0700

On Tue, Apr 22, 2008 at 12:22 PM, John Byrne <[EMAIL PROTECTED]> wrote:
> Hi,
>
>  I'm using the AbstractXMLDocumentParser class from the XNI.
>
>  Is there an event raised in this class for specially encoded characters,
> for example &#x201d; ?
>
>  If I parse a document containing references such as this, and print the
> output, I get the interpretation of the character (double quotes in this
> case), rather than the original character sequence - so I'm wondering, at
> what level does this interpretation take place?
>
>  Is there a way I can get the parser, or one of it's underlying objects, to
> notify me that it found &#x201d; as a raw sequence of charcaters?


I studied it for a while and found that if it appears as a text node
then you can use the LexicalHandler to be notified (IIRC) on attribute
nodes there's no event but you can access the non normalized value.
See the last thread on
http://mail-archives.apache.org/mod_mbox/xerces-j-users/200803.mbox/thread,
subject: "How to disable attribute normalization" for more details.

In the end I decided to "trick" Xerces, because my goal was to keep
the entities unevaluated. All I did was make replace & with &amp; in
the source document (I decorated the source InputStream and
transformed it on the fly). If you want to follow this approach, I can
give you this class, it's licensed under the ASL.

>  Thanks in advance!
>
>  -John

Best regards,
Daniel Yokomizo.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: encoded character references

Reply via email to