[jira] [Commented] (CAMEL-7468) Make xmlTokenizer more xml-aware so that it can handle more flexible structures

Aki Yoshida (JIRA) Tue, 27 May 2014 09:52:45 -0700

    [ 
https://issues.apache.org/jira/browse/CAMEL-7468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14009880#comment-14009880
 ]


Aki Yoshida commented on CAMEL-7468:
------------------------------------

I added a new version that uses the stax parser to search for the target token 
and extract the token from its underling buffer directly.

As XML tokenizing is inherently different from the non-xml tokenizing. I 
created its own language and expression for this new xml tokenizer.

I noticed there is a difference in the behavior of 
XMLStreamReader.getLocation() between woodstox 
(com.ctc.wstx.sr.ValidatingStreamReader) and JDK 
(com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl). Namely, woodstox 
returns the location at the beginning of the token whereas JDK returns the 
location at the end of the token. For example, when at START_ELEMENT, woodstox 
returns the position of "<" of that start tag, whereas JDK returns the position 
of ">" of that tag.

I need to get this behavior clarified and I'll probably need to add an 
auto-detect mechanism.


> Make xmlTokenizer more xml-aware so that it can handle more flexible 
> structures
> -------------------------------------------------------------------------------
>
>                 Key: CAMEL-7468
>                 URL: https://issues.apache.org/jira/browse/CAMEL-7468
>             Project: Camel
>          Issue Type: Improvement
>          Components: camel-core
>            Reporter: Aki Yoshida
>            Assignee: Aki Yoshida
>             Fix For: 2.14.0
>
>
> The existing xmlTokenizer can tokenize an XML document using the specified 
> element tag name and produce a series of tokens that are either the child 
> tokens with the injected namespace declarations from its parent node or the 
> tokens wrapped in their ancestor elements.
> That implementation has several limitations:
> - a specific namespace cannot be specified.
> - a specific hierarchy cannot be specified.
> - the wrap mode assumes each token to have the same ancestor path.
> This patch will remove these limitations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CAMEL-7468) Make xmlTokenizer more xml-aware so that it can handle more flexible structures

Reply via email to