Re: Converting parquet MessageType to flink RowType

2022-01-06 Thread Meghajit Mazumdar
Hi Jing, Thanks for explaining this. This helps. As you suggested, I tried specifying some of the field names with the field types for my parquet files, and it works. I am able to read the specific fields. However, I have some nested fields also in my parquet schema like this which I want to rea

Re: Converting parquet MessageType to flink RowType

2022-01-06 Thread Jing Ge
Hi Meghajit, good catch! Thanks for correcting me. The question is about how to use column-oriented storage format like Parquet. What I tried to explain was that the original MessageType has been used to build a projected MessageType, since only required columns should be read. Without the input f

Re: Converting parquet MessageType to flink RowType

2022-01-06 Thread Meghajit Mazumdar
Hi Jing, Thanks for the reply. Had 2 doubts related to your answer : 1. There was a conversion from Flink GroupType to Parquet MessageType. It might be possible to build the conversion the other way around. -> Both GroupType and MessageType are parquet data structures I believe, present in the or

Re: Converting parquet MessageType to flink RowType

2022-01-06 Thread Jing Ge
Hi Meghajit, thanks for asking. If you took a look at the source code https://github.com/apache/flink/blob/9bbadb9b105b233b7565af120020ebd8dce69a4f/flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/ParquetVectorizedInputFormat.java#L174, you should see Parquet MessageType

Converting parquet MessageType to flink RowType

2022-01-05 Thread Meghajit Mazumdar
Hello, We want to read and process Parquet Files using a FileSource and the DataStream API. Currently, as referenced from the documentation