Hi,

Fuzzo <[EMAIL PROTECTED]> wrote on 10/22/2008 03:54:18 AM:

> Hi all!
>
> Let me explain the problem with an example.
> I have to parse an XML in this form:
>
> <anomaly id="0012" severity="4">some_text_with_%_symbol</anomaly>
>
> With Xerces1 SAX parser, the element text (some_text_with_%A7_symbol) is
> parsed in one solution with full length invoking the characters(char[]
ch,
> int start, int length) method.
>
> With Xerces2, the element text is parsed in 30 bytes slot and the method
is
> invoked some times until the text element is fully parsed.
>
> Now, in my application the text element is sometimes encoded with
> java.net.URLEncoder class and then decoded with java.net.URLDecoder.
>
> With Xerces2, happens that the element substring can be in form of
> first_part_of_text_% and URLDecoder can't handle correctly the final %
char,
> giving me a URLDecoder: Incomplete trailing escape (%) pattern because it
> does not find the 2 following chars (ex.: %A7 means the ยง symbol in
Cp1252
> encoding).
>
> There is a way to configure Xerces2 to parse text elements in only one
> solution?

No. characters() may be called multiple times [1][2] for contiguous text.
You cannot assume it will only be called once. Your ContentHandler needs to
accumulate the text returned in each call of characters() until you receive
a callback that isn't characters.

> Many thanks!
>
>
> --
> View this message in context: http://www.nabble.com/Xerces2-vs-
> Xerces1-Element-Text-Parsing-Implementation-tp20105730p20105730.html
> Sent from the Xerces - J - Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]

Thanks.

[1]
http://xerces.apache.org/xerces2-j/javadocs/api/org/xml/sax/ContentHandler.html#characters(char[],%20int,%20int)
[2] http://xerces.apache.org/xerces2-j/faq-sax.html#faq-2

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: [EMAIL PROTECTED]
E-mail: [EMAIL PROTECTED]

Reply via email to