[
https://issues.apache.org/jira/browse/NIFI-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
David Handermann resolved NIFI-8292.
------------------------------------
Fix Version/s: 1.14.0
Assignee: David Handermann
Resolution: Fixed
NIFI-8439 incorporated an update of the parquet-hadoop library from 1.10.0 to
1.12.0, which resolves the issue with JSON schema serialization.
> ParquetReader can't read FlowFile, which was written by ParquerRecordSetWriter
> ------------------------------------------------------------------------------
>
> Key: NIFI-8292
> URL: https://issues.apache.org/jira/browse/NIFI-8292
> Project: Apache NiFi
> Issue Type: Bug
> Components: Core Framework
> Affects Versions: 1.11.4, 1.13.0
> Environment: docker
> Reporter: Nikolay Nikolaev
> Assignee: David Handermann
> Priority: Major
> Fix For: 1.14.0
>
> Attachments: Test_Parquet_Reader_Writer.xml, cut_from_nifi-app.log
>
>
> h1. Steps to reproduce the bug
> # Start NiFi in Docker:
> {code}docker pull apache/nifi:latest
> docker run -p 8083:8080 --name nifi_container_latest -v <your path to
> logs-folder>:/opt/nifi/nifi-current/logs -v <your path to
> file-folder>:/file_folder apache/nifi:latest{code}
> # upload tamplate [^Test_Parquet_Reader_Writer.xml] (see an attach)
> # create Flow from upploaded template *Test_Parquet_Reader_Writer.xml*
> # enable all 4 controller services in NiFi Flow Configuration
> # start flow
> # get an error in "ConvertRecord(JSON_to_Parquet)" processor
> # stop flow
> # check *logs-folder* (see nifi-app.log) and *file_folder* (contains
> parquet-files and json-files). In nifi-app.log will bee the error like this
> (full message see in [^cut_from_nifi-app.log] ):
> {quote}2021-03-04 07:26:39,448 ERROR [Timer-Driven Process Thread-8]
> o.a.n.processors.standard.ConvertRecord
> ConvertRecord[id=35a86417-bd7c-31c2-ae9e-bf808e428b03] Failed to process
> StandardFlowFileRecord[uuid=eef69d98-1b2a-4b89-8267-0b4598e53d05,claim=StandardContentClaim
> [resourceClaim=StandardResourceClaim[id=1614842777315-1, container=default,
> section=1], offset=128,
> length=1007],offset=0,name=eef69d98-1b2a-4b89-8267-0b4598e53d05,size=1007];
> will route to failure: org.apache.avro.SchemaParseException: Can't redefine:
> list
> org.apache.avro.SchemaParseException: Can't redefine: list
> at org.apache.avro.Schema$Names.put(Schema.java:1128)
> at org.apache.avro.Schema$NamedSchema.writeNameRef(Schema.java:562)
> at ...{quote}
> h1. Description
> This test flow generate 3 JSON's via GenerateFlowFile processor:
> Simple JSON:
> {code}
> {
> "field1": "value_field",
> "feild2": "value_field2"
> }
> {code}
> 1st JSON:
> {code}
> {
> "field1": "value_field",
> "array1": [
> {
> "feild2": "value_field2"
> }
> ]
> }
> {code}
> 2st JSON:
> {code}
> {
> "field": "value_field",
> "array1": [
> {
> "array2": ["a_value_array2","b_value_array2"
> ]
> }
> ]
> }
> {code}
> Then convert JSON into Parquet (via ConvertRecord(JSON_to_Parquet)) and back
> to JSON (via ConvertRecord (Parquet_to_JSON)). To facilitate analysis JSON-
> and Parquet files are saved to the *file_folder*.
> In the *file_folder* we can see, that all JSON's was seccessfull converted
> into parquet-files. But back to JSON only "Simple JSON" and "1st JSON" was
> converted. The "2st JSON" сauses an error in ConvertRecord.
> So, in certain cases ParquetReader can't read file, which was created by
> ParquerRecordSetWriter, for example in case of "2st JSON"(which has more
> complex nesting structure).
> This bug is reproduced in the version 1.11.4 and 1.13.0.
> In version 1.12.1 I couldn't reproduce it because of NIFI-7817
--
This message was sent by Atlassian Jira
(v8.20.1#820001)