[ https://issues.apache.org/jira/browse/FLINK-35641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jingsong Lee closed FLINK-35641. -------------------------------- Fix Version/s: 2.0.0 Resolution: Fixed fixed in: a54311e89406c88e93b8e93d9ab484dc841bce0a [~asorokoumov] I just merged this in master, feel free to re-open this Jira if you want to cherry-pick to 1.x. > ParquetSchemaConverter should correctly handle field optionality > ---------------------------------------------------------------- > > Key: FLINK-35641 > URL: https://issues.apache.org/jira/browse/FLINK-35641 > Project: Flink > Issue Type: Bug > Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile) > Reporter: Alex Sorokoumov > Assignee: Alex Sorokoumov > Priority: Major > Labels: patch-available, pull-request-available > Fix For: 2.0.0 > > > At the moment, > [ParquetSchemaConverter|https://github.com/apache/flink/blob/99d6fd3c68f46daf0397a35566414e1d19774c3d/flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/utils/ParquetSchemaConverter.java#L64] > marks all fields as optional. This is not correct in general and especially > when it comes to handling maps. For example, > [parquet-tools|https://pypi.org/project/parquet-tools/] breaks on the Parquet > file produced by > [ParquetRowDataWriterTest#complexTypeTest|https://github.com/apache/flink/blob/99d6fd3c68f46daf0397a35566414e1d19774c3d/flink-formats/flink-parquet/src/test/java/org/apache/flink/formats/parquet/row/ParquetRowDataWriterTest.java#L140-L151]: > {noformat} > parquet-tools inspect > /var/folders/sc/k3hr87fj4x169rdq9n107whw0000gp/T/junit14646865447948471989/3b328592-7315-48c6-8fa9-38da4048fb4e > Traceback (most recent call last): > File "/Users/asorokoumov/.pyenv/versions/3.12.3/bin/parquet-tools", line 8, > in <module> > sys.exit(main()) > ^^^^^^ > File > "/Users/asorokoumov/.pyenv/versions/3.12.3/lib/python3.12/site-packages/parquet_tools/cli.py", > line 26, in main > args.handler(args) > File > "/Users/asorokoumov/.pyenv/versions/3.12.3/lib/python3.12/site-packages/parquet_tools/commands/inspect.py", > line 55, in _cli > _execute_simple( > File > "/Users/asorokoumov/.pyenv/versions/3.12.3/lib/python3.12/site-packages/parquet_tools/commands/inspect.py", > line 63, in _execute_simple > pq_file: pq.ParquetFile = pq.ParquetFile(filename) > ^^^^^^^^^^^^^^^^^^^^^^^^ > File > "/Users/asorokoumov/.pyenv/versions/3.12.3/lib/python3.12/site-packages/pyarrow/parquet/core.py", > line 317, in __init__ > self.reader.open( > File "pyarrow/_parquet.pyx", line 1492, in > pyarrow._parquet.ParquetReader.open > File "pyarrow/error.pxi", line 91, in pyarrow.lib.check_status > pyarrow.lib.ArrowInvalid: Map keys must be annotated as required. > {noformat} > [The correct thing to > do|https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#maps] > is to mark nullable fields as optional, otherwise required. -- This message was sent by Atlassian Jira (v8.20.10#820010)