Pierre Villard created NIFI-13843:
-------------------------------------
Summary: Unknown fields not dropped by JSON Writer as expected by
specified schema
Key: NIFI-13843
URL: https://issues.apache.org/jira/browse/NIFI-13843
Project: Apache NiFi
Issue Type: Bug
Components: Extensions
Affects Versions: 2.0.0-M4, 1.27.0
Reporter: Pierre Villard
Assignee: Pierre Villard
Consider the following use case:
* GFF Processor, generating a JSON with 3 fields: a, b, and c
* ConvertRecord with JSON Reader / JSON Writer
** Both reader and writer are configured with a schema only specifying fields
a and b
The expected result is a JSON that only contains fields a and b.
We're following the below path in the code:
* AbstractRecordProcessor (L131)
{code:java}
Record firstRecord = reader.nextRecord(); {code}
In this case, the default method for nextRecord() is defined in RecordReader
(L50)
{code:java}
default Record nextRecord() throws IOException, MalformedRecordException {
return nextRecord(true, false);
} {code}
where we are NOT dropping the unknown fields (Java doc needs some fixing here
as it is saying the opposite)
We get to
{code:java}
writer.write(firstRecord); {code}
which gets us to
* WriteJsonResult (L206)
Here, we do a check
{code:java}
isUseSerializeForm(record, writeSchema) {code}
which currently returns true when it should not. Because of this we write the
serialised form which ignores the writer schema.
In this method isUseSerializeForm(), we do check
{code:java}
record.getSchema().equals(writeSchema) {code}
But at this point record.getSchema() returns the schema defined in the reader
which is equal to the one defined in the writer - even though the record has
additional fields compared to the defined schema.
The suggested fix is check is to also add a check on
{code:java}
record.isDropUnknownFields() {code}
If dropUnknownFields is false, then we do not use the serialised form.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)