Yes, Parquet files can be read in splits (=in parallel). Which enumerator is used is determined here [1].
[1] https://github.com/apache/flink/blob/master/flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/ParquetVectorizedInputFormat.java#L170-L170 On Fri, Dec 10, 2021 at 11:44 AM Krzysztof Chmielewski < krzysiek.chmielew...@gmail.com> wrote: > Hi Roman, > Thank you. > > I'm familiar with FLIP-27 and I was analyzing the new File Source. > > From there I saw that there are two FileEnumerators -> one that allows for > file split and other that does not. BlockSplittingRecursiveEnumerator > and NonSplittingRecursiveEnumerator. > I was wondering if BlockSplittingRecursiveEnumerator can be used for > Parquet file. > > Actually does Parquet format supports reading file in blocks by different > threads. Do those blocks have to be "merged" later or can I just read them > row by row. > > Regards, > Krzysztof Chmielewski > > pt., 10 gru 2021 o 09:27 Roman Khachatryan <ro...@apache.org> napisaĆ(a): > >> Hi, >> >> Yes, file source does support DoP > 1. >> And in general, a single file can be read in parallel after FLIP-27. >> However, parallel reading of a single Parquet file is currently not >> supported AFAIK. >> >> Maybe Arvid or Fabian could shed more light here. >> >> Regards, >> Roman >> >> On Thu, Dec 9, 2021 at 12:03 PM Krzysztof Chmielewski >> <krzysiek.chmielew...@gmail.com> wrote: >> > >> > Hi, >> > can I have a File DataStream Source that will work with Parquet Format >> and have parallelism level higher than one? >> > >> > Is it possible to read Parquet file in chunks by multiple threads? >> > >> > Regards, >> > Krzysztof Chmielewski >> >