[jira] [Resolved] (NIFI-8292) ParquetReader can't read FlowFile, which was written by ParquerRecordSetWriter

David Handermann (Jira) Wed, 08 Dec 2021 09:27:05 -0800


     [ 
https://issues.apache.org/jira/browse/NIFI-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


David Handermann resolved NIFI-8292.
------------------------------------
    Fix Version/s: 1.14.0
         Assignee: David Handermann
       Resolution: Fixed

NIFI-8439 incorporated an update of the parquet-hadoop library from 1.10.0 to 
1.12.0, which resolves the issue with JSON schema serialization.

> ParquetReader can't read FlowFile, which was written by ParquerRecordSetWriter
> ------------------------------------------------------------------------------
>
>                 Key: NIFI-8292
>                 URL: https://issues.apache.org/jira/browse/NIFI-8292
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Core Framework
>    Affects Versions: 1.11.4, 1.13.0
>         Environment: docker
>            Reporter: Nikolay Nikolaev
>            Assignee: David Handermann
>            Priority: Major
>             Fix For: 1.14.0
>
>         Attachments: Test_Parquet_Reader_Writer.xml, cut_from_nifi-app.log
>
>
> h1. Steps to reproduce the bug
> # Start NiFi in Docker:
> {code}docker pull apache/nifi:latest
> docker run -p 8083:8080 --name nifi_container_latest -v <your path to 
> logs-folder>:/opt/nifi/nifi-current/logs -v <your path to 
> file-folder>:/file_folder apache/nifi:latest{code}
> # upload tamplate  [^Test_Parquet_Reader_Writer.xml]  (see an attach)
> # create Flow from upploaded template *Test_Parquet_Reader_Writer.xml*
> # enable all 4 controller services in NiFi Flow Configuration
> # start flow
> # get an error in "ConvertRecord(JSON_to_Parquet)" processor
> # stop flow
> # check *logs-folder* (see nifi-app.log) and *file_folder* (contains 
> parquet-files and json-files). In nifi-app.log will bee the error like this 
> (full message see in  [^cut_from_nifi-app.log] ):
> {quote}2021-03-04 07:26:39,448 ERROR [Timer-Driven Process Thread-8] 
> o.a.n.processors.standard.ConvertRecord 
> ConvertRecord[id=35a86417-bd7c-31c2-ae9e-bf808e428b03] Failed to process 
> StandardFlowFileRecord[uuid=eef69d98-1b2a-4b89-8267-0b4598e53d05,claim=StandardContentClaim
>  [resourceClaim=StandardResourceClaim[id=1614842777315-1, container=default, 
> section=1], offset=128, 
> length=1007],offset=0,name=eef69d98-1b2a-4b89-8267-0b4598e53d05,size=1007]; 
> will route to failure: org.apache.avro.SchemaParseException: Can't redefine: 
> list
> org.apache.avro.SchemaParseException: Can't redefine: list
>       at org.apache.avro.Schema$Names.put(Schema.java:1128)
>       at org.apache.avro.Schema$NamedSchema.writeNameRef(Schema.java:562)
>       at ...{quote}
> h1. Description
> This test flow generate 3 JSON's via GenerateFlowFile processor:
> Simple JSON:
> {code}
>       { 
>         "field1": "value_field",
>       "feild2": "value_field2"
>       }
> {code}
> 1st JSON:
> {code}
>       { 
>         "field1": "value_field",
>         "array1": [
>               {
>                 "feild2": "value_field2"
>               }
>         ]
>       }
> {code}
> 2st JSON:
> {code}
>       { 
>         "field": "value_field",
>         "array1": [
>               {
>                 "array2": ["a_value_array2","b_value_array2"
>                 ]
>               }
>         ]
>       }
> {code}
> Then convert JSON into Parquet (via ConvertRecord(JSON_to_Parquet)) and back 
> to JSON (via ConvertRecord (Parquet_to_JSON)). To facilitate analysis  JSON- 
> and Parquet files are saved to the *file_folder*.
> In the *file_folder* we can see, that all JSON's was seccessfull converted 
> into parquet-files. But back to JSON only "Simple JSON" and "1st JSON"  was 
> converted. The "2st JSON" сauses an error in ConvertRecord.
> So, in certain cases ParquetReader can't read file, which was created by 
> ParquerRecordSetWriter, for example in case of "2st JSON"(which has more 
> complex nesting structure).
> This bug is reproduced in the version 1.11.4 and 1.13.0. 
> In version 1.12.1 I couldn't reproduce it because of NIFI-7817



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Resolved] (NIFI-8292) ParquetReader can't read FlowFile, which was written by ParquerRecordSetWriter

Reply via email to