On Tue, Apr 22, 2008 at 12:22 PM, John Byrne <[EMAIL PROTECTED]> wrote: > Hi, > > I'm using the AbstractXMLDocumentParser class from the XNI. > > Is there an event raised in this class for specially encoded characters, > for example ” ? > > If I parse a document containing references such as this, and print the > output, I get the interpretation of the character (double quotes in this > case), rather than the original character sequence - so I'm wondering, at > what level does this interpretation take place? > > Is there a way I can get the parser, or one of it's underlying objects, to > notify me that it found ” as a raw sequence of charcaters?
I studied it for a while and found that if it appears as a text node then you can use the LexicalHandler to be notified (IIRC) on attribute nodes there's no event but you can access the non normalized value. See the last thread on http://mail-archives.apache.org/mod_mbox/xerces-j-users/200803.mbox/thread, subject: "How to disable attribute normalization" for more details. In the end I decided to "trick" Xerces, because my goal was to keep the entities unevaluated. All I did was make replace & with & in the source document (I decorated the source InputStream and transformed it on the fly). If you want to follow this approach, I can give you this class, it's licensed under the ASL. > Thanks in advance! > > -John Best regards, Daniel Yokomizo. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]