RowType for complex types in Parquet File

Meghajit Mazumdar Thu, 06 Jan 2022 21:11:37 -0800

Hello,

Flink documentation mentions this
<https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/connectors/datastream/formats/parquet/#:~:text=contain%20event%20timestamps.-,final%20LogicalType%5B%5D%20fieldTypes%20%3D%0A%20%20new%20LogicalType%5B%5D%20%7B%0A%20%20new%20DoubleType()%2C%20new%20IntType()%2C%20new,DataStream%3CRowData%3E%20stream%20%3D%0A%20%20env.fromSource(source%2C%20WatermarkStrategy.noWatermarks()%2C%20%22file%2Dsource%22)%3B,-Continuous%20read%20example>
as to how to create a FileSource for reading Parquet files.
For primitive parquet types like BINARY and BOOLEAN, I am able to create a
RowType and read the fields.


However, I have some nested fields in my parquet schema also like this
which I want to read :

  optional group location = 11 {
    optional double latitude = 1;
    optional double longitude = 2;
  }

How can I create a RowType for this ? I did something like this below, but
I got an exception `Caused by: java.lang.UnsupportedOperationException:
Complex types not supported`

            RowType nestedRowType = RowType.of(new LogicalType[] {new
DoubleType(), new DoubleType()}, new String[]{"latitude", "longitude"});
            final LogicalType[] fieldTypes = new
LogicalType[]{nestedRowType};
            final ParquetColumnarRowInputFormat<FileSourceSplit> format =
                    new ParquetColumnarRowInputFormat<>(
                            new Configuration(),
                            RowType.of(fieldTypes, new
String[]{"location"}),
                            500,
                            false,
                            true);

RowType for complex types in Parquet File

Reply via email to