[ 
https://issues.apache.org/jira/browse/NIFI-7790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17221458#comment-17221458
 ] 

Pierre Gramme commented on NIFI-7790:
-------------------------------------

Can anybody reproduce the bug with the provided template?

> XML record reader - failure on well-formed XML
> ----------------------------------------------
>
>                 Key: NIFI-7790
>                 URL: https://issues.apache.org/jira/browse/NIFI-7790
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Extensions
>    Affects Versions: 1.11.4
>            Reporter: Pierre Gramme
>            Priority: Major
>              Labels: records, xml
>         Attachments: bug-parse-xml.xml
>
>
> I am using ConvertRecord in order to parse XML flowfiles to Avro, with the 
> Infer Schema strategy. Some input flowfiles are sent to the failure output 
> queue whereas they are well-formed: 
> {code:java}
> <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
> <root>
>       <authors>
>               <item>
>                       <name>Neil Gaiman</name>
>               </item>
>       </authors>
>       <editors>
>               <item>
>                       <commercialName>Hachette</commercialName>
>               </item>
>       </editors>
> </root>
> {code}
> Note the use of authors/item/name on one side, and 
> editors/item/commercialName on the other side.
> On the other hand, this gets correctly parsed: 
> {code:java}
> <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
> <root>
>       <authors>
>               <item>
>                       <name>Neil Gaiman</name>
>               </item>
>       </authors>
>       <editors>
>               <item>
>                       <name>Hachette</name>
>               </item>
>       </editors>
> </root>
> {code}
> See the attached template for minimal reproducible example.
>  
> My interpretation is that the failure in the first case is due to 2 
> independent XML node types having the same name (<item> in this case) but 
> having different types and occurring in different parents with different 
> types. In the second case, both <item>'s actually have the same node type. I 
> didn't use any Schema Inference Cache, so both item types should be inferred 
> independently. 
> Since the first document is legal XML (an XSD could be written for it) and 
> can also be represented in Avro, its conversion shouldn't fail.
> I'll be happy to provide more details if needed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to