[
https://issues.apache.org/jira/browse/NIFI-13334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17886360#comment-17886360
]
Sandro Berger commented on NIFI-13334:
--------------------------------------
Hi [~Steve Hindmarch]
I think your issue is related to the 'Inherit Schema' setting in the XML Reader.
I tried both your XML examples with our Nifi 1.23.2 and as I suspected, they
set different schemas (checked by using ConvertRecord with a
AvroRecordSetWriter to easily get the schemas):
First example's schema has for {{Data}} an array of {{UserData_DataType}}
records:
{code:json}
{
"type": "record",
"name": "nifiRecord",
"namespace": "org.apache.nifi",
"fields": [
{
"name": "Type",
"type": [
"string",
"null"
]
},
{
"name": "UserData",
"type": [
{
"type": "record",
"name": "UserDataType",
"fields": [
{
"name": "Data",
"type": [
{
"type": "array",
"items": {
"type": "record",
"name": "UserData_DataType",
"fields": [
{
"name": "Name",
"type": [
"string",
"null"
]
},
{
"name": "Value",
"type": [
"string",
"null"
]
}
]
}
},
"null"
]
}
]
},
"null"
]
}
]
}
{code}
Second has for {{Data}} a choice of a {{UserData_DataType}} record or an array
of {{{}UserData_DataType{}}}. I don't understand why - but this choice seems to
be the problem:
{code:json}
{
"type": "record",
"name": "nifiRecord",
"namespace": "org.apache.nifi",
"fields": [
{
"name": "Type",
"type": [
"string",
"null"
]
},
{
"name": "UserData",
"type": [
{
"type": "record",
"name": "UserDataType",
"fields": [
{
"name": "Data",
"type": [
{
"type": "record",
"name": "UserData_DataType",
"fields": [
{
"name": "Name",
"type": [
"string",
"null"
]
},
{
"name": "Value",
"type": [
"string",
"null"
]
}
]
},
{
"type": "array",
"items": "UserData_DataType"
},
"null"
]
}
]
},
"null"
]
}
]
}
{code}
When I set the first example's avro schema in the XMLReader (to ensure we
always get the array of {{UserData_DataType)}} and convert your second example
XML to JSON I get this:
{code:json}
[ {
"Type" : "foo",
"UserData" : {
"Data" : [ {
"Name" : "Param1",
"Value" : "String1"
} ]
}
}, {
"Type" : "bar",
"UserData" : {
"Data" : [ {
"Name" : "Param1",
"Value" : "String"
}, {
"Name" : "Param2",
"Value" : "String2"
}, {
"Name" : "Param3",
"Value" : "String3"
} ]
}
} ]
{code}
I hope this is what you were expecting in the first place and setting the
schema manually could be a workaround for you.
> XMLReader drops name-value content tags from record arrays if one record has
> only one tag
> -----------------------------------------------------------------------------------------
>
> Key: NIFI-13334
> URL: https://issues.apache.org/jira/browse/NIFI-13334
> Project: Apache NiFi
> Issue Type: Bug
> Components: Core UI
> Affects Versions: 1.24.0
> Environment: Docker
> Reporter: Stephen Jeffrey Hindmarch
> Priority: Major
>
> If you create an XMLReader service and set the following:
> * Parse XML Attributes = true
> * Expect Records as Arrays = true
> * Field Name for Content = Value
> Then use the reader in a ConvertRecord processor with a JSONRecordSetWriter
> When parsing a flow file such as
> {noformat}
> <Events>
> <Event Type="foo">
> <UserData>
> <Data Name="Param1">String1</Data>
> <Data Name="Param2">String2</Data>
> </UserData>
> </Event>
> <Event Type="bar">
> <UserData>
> <Data Name="Param1">String</Data>
> <Data Name="Param2">String2</Data>
> <Data Name="Param3">String3</Data>
> </UserData>
> </Event>
> </Events>{noformat}
> Then as expected the content tags are parsed into arrays
> {noformat}
> [ {
> "Type" : "foo",
> "UserData" : {
> "Data" : [ {
> "Name" : "Param1",
> "Value" : "String1"
> }, {
> "Name" : "Param2",
> "Value" : "String2"
> } ]
> }
> }, {
> "Type" : "bar",
> "UserData" : {
> "Data" : [ {
> "Name" : "Param1",
> "Value" : "String"
> }, {
> "Name" : "Param2",
> "Value" : "String2"
> }, {
> "Name" : "Param3",
> "Value" : "String3"
> } ]
> }
> } ]{noformat}
> But if one of the records has only one data tag, then it will not be
> presented in an array, and more importantly, nor will the tags for the other
> record. Instead, all but the last tags are dropped.
> For example
> {noformat}
> <Events>
> <Event Type="foo">
> <UserData>
> <Data Name="Param1">String1</Data>
> </UserData>
> </Event>
> <Event Type="bar">
> <UserData>
> <Data Name="Param1">String</Data>
> <Data Name="Param2">String2</Data>
> <Data Name="Param3">String3</Data>
> </UserData>
> </Event>
> </Events>{noformat}
> parses to
> {noformat}
> [ {
> "Type" : "foo",
> "UserData" : {
> "Data" : {
> "Name" : "Param1",
> "Value" : "String1"
> }
> }
> }, {
> "Type" : "bar",
> "UserData" : {
> "Data" : {
> "Name" : "Param3",
> "Value" : "String3"
> }
> }
> } ]{noformat}
> Note that the second event has lost all but the last of its data content tags.
> It does not matter which event (first or second) has 1 tag, the other event
> loses content.
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)