[ 
https://issues.apache.org/jira/browse/NIFI-13334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17886360#comment-17886360
 ] 

Sandro Berger commented on NIFI-13334:
--------------------------------------

Hi [~Steve Hindmarch] 

I think your issue is related to the 'Inherit Schema' setting in the XML Reader.

I tried both your XML examples with our Nifi 1.23.2 and as I suspected, they 
set different schemas (checked by using ConvertRecord with a 
AvroRecordSetWriter to easily get the schemas):

First example's schema has for {{Data}} an array of {{UserData_DataType}} 
records:
{code:json}
{
 "type": "record",
 "name": "nifiRecord",
 "namespace": "org.apache.nifi",
 "fields": [
  {
   "name": "Type",
   "type": [
    "string",
    "null"
   ]
  },
  {
   "name": "UserData",
   "type": [
    {
     "type": "record",
     "name": "UserDataType",
     "fields": [
      {
       "name": "Data",
       "type": [
        {
         "type": "array",
         "items": {
          "type": "record",
          "name": "UserData_DataType",
          "fields": [
           {
            "name": "Name",
            "type": [
             "string",
             "null"
            ]
           },
           {
            "name": "Value",
            "type": [
             "string",
             "null"
            ]
           }
          ]
         }
        },
        "null"
       ]
      }
     ]
    },
    "null"
   ]
  }
 ]
}
{code}
 
Second has for {{Data}} a choice of a {{UserData_DataType}} record or an array 
of {{{}UserData_DataType{}}}. I don't understand why - but this choice seems to 
be the problem:
{code:json}
{
 "type": "record",
 "name": "nifiRecord",
 "namespace": "org.apache.nifi",
 "fields": [
  {
   "name": "Type",
   "type": [
    "string",
    "null"
   ]
  },
  {
   "name": "UserData",
   "type": [
    {
     "type": "record",
     "name": "UserDataType",
     "fields": [
      {
       "name": "Data",
       "type": [
        {
         "type": "record",
         "name": "UserData_DataType",
         "fields": [
          {
           "name": "Name",
           "type": [
            "string",
            "null"
           ]
          },
          {
           "name": "Value",
           "type": [
            "string",
            "null"
           ]
          }
         ]
        },
        {
         "type": "array",
         "items": "UserData_DataType"
        },
        "null"
       ]
      }
     ]
    },
    "null"
   ]
  }
 ]
}
{code}
When I set the first example's avro schema in the XMLReader (to ensure we 
always get the array of {{UserData_DataType)}} and convert your second example 
XML to JSON I get this:
{code:json}
[ {
  "Type" : "foo",
  "UserData" : {
    "Data" : [ {
      "Name" : "Param1",
      "Value" : "String1"
    } ]
  }
}, {
  "Type" : "bar",
  "UserData" : {
    "Data" : [ {
      "Name" : "Param1",
      "Value" : "String"
    }, {
      "Name" : "Param2",
      "Value" : "String2"
    }, {
      "Name" : "Param3",
      "Value" : "String3"
    } ]
  }
} ]
{code}
I hope this is what you were expecting in the first place and setting the 
schema manually could be a workaround for you.

> XMLReader drops name-value content tags from record arrays if one record has 
> only one tag
> -----------------------------------------------------------------------------------------
>
>                 Key: NIFI-13334
>                 URL: https://issues.apache.org/jira/browse/NIFI-13334
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Core UI
>    Affects Versions: 1.24.0
>         Environment: Docker
>            Reporter: Stephen Jeffrey Hindmarch
>            Priority: Major
>
> If you create an XMLReader service and set the following:
>  * Parse XML Attributes = true
>  * Expect Records as Arrays = true
>  * Field Name for Content = Value
> Then use the reader in a ConvertRecord processor with a JSONRecordSetWriter
> When parsing a flow file such as
> {noformat}
> <Events>
>   <Event Type="foo">
>     <UserData>
>       <Data Name="Param1">String1</Data>
>       <Data Name="Param2">String2</Data>
>     </UserData>
>   </Event>
>   <Event Type="bar">
>     <UserData>
>       <Data Name="Param1">String</Data>
>       <Data Name="Param2">String2</Data>
>       <Data Name="Param3">String3</Data>
>     </UserData>
>   </Event>
> </Events>{noformat}
> Then as expected the content tags are parsed into arrays
> {noformat}
> [ {
>   "Type" : "foo",
>   "UserData" : {
>     "Data" : [ {
>       "Name" : "Param1",
>       "Value" : "String1"
>     }, {
>       "Name" : "Param2",
>       "Value" : "String2"
>     } ]
>   }
> }, {
>   "Type" : "bar",
>   "UserData" : {
>     "Data" : [ {
>       "Name" : "Param1",
>       "Value" : "String"
>     }, {
>       "Name" : "Param2",
>       "Value" : "String2"
>     }, {
>       "Name" : "Param3",
>       "Value" : "String3"
>     } ]
>   }
> } ]{noformat}
> But if one of the records has only one data tag, then it will not be 
> presented in an array, and more importantly, nor will the tags for the other 
> record. Instead, all but the last tags are dropped.
> For example
> {noformat}
> <Events>
>   <Event Type="foo">
>     <UserData>
>       <Data Name="Param1">String1</Data>
>     </UserData>
>   </Event>
>   <Event Type="bar">
>     <UserData>
>       <Data Name="Param1">String</Data>
>       <Data Name="Param2">String2</Data>
>       <Data Name="Param3">String3</Data>
>     </UserData>
>   </Event>
> </Events>{noformat}
> parses to
> {noformat}
> [ {
>   "Type" : "foo",
>   "UserData" : {
>     "Data" : {
>       "Name" : "Param1",
>       "Value" : "String1"
>     }
>   }
> }, {
>   "Type" : "bar",
>   "UserData" : {
>     "Data" : {
>       "Name" : "Param3",
>       "Value" : "String3"
>     }
>   }
> } ]{noformat}
> Note that the second event has lost all but the last of its data content tags.
> It does not matter which event (first or second) has 1 tag, the other event 
> loses content.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to