Yes, Parquet files can be read in splits (=in parallel). Which enumerator
is used is determined here [1].

[1]
https://github.com/apache/flink/blob/master/flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/ParquetVectorizedInputFormat.java#L170-L170

On Fri, Dec 10, 2021 at 11:44 AM Krzysztof Chmielewski <
krzysiek.chmielew...@gmail.com> wrote:

> Hi Roman,
> Thank you.
>
> I'm familiar with FLIP-27 and I was analyzing the new File Source.
>
> From there I saw that there are two FileEnumerators -> one that allows for
> file split and other that does not. BlockSplittingRecursiveEnumerator
> and NonSplittingRecursiveEnumerator.
> I was wondering if  BlockSplittingRecursiveEnumerator can be used for
> Parquet file.
>
> Actually does Parquet format supports reading file in blocks by different
> threads. Do those blocks have to be "merged" later or can I just read them
> row by row.
>
> Regards,
> Krzysztof Chmielewski
>
> pt., 10 gru 2021 o 09:27 Roman Khachatryan <ro...@apache.org> napisaƂ(a):
>
>> Hi,
>>
>> Yes, file source does support DoP > 1.
>> And in general, a single file can be read in parallel after FLIP-27.
>> However, parallel reading of a single Parquet file is currently not
>> supported AFAIK.
>>
>> Maybe Arvid or Fabian could shed more light here.
>>
>> Regards,
>> Roman
>>
>> On Thu, Dec 9, 2021 at 12:03 PM Krzysztof Chmielewski
>> <krzysiek.chmielew...@gmail.com> wrote:
>> >
>> > Hi,
>> > can I have a File DataStream Source that will work with Parquet Format
>> and have parallelism level higher than one?
>> >
>> > Is it possible to read  Parquet  file in chunks by multiple threads?
>> >
>> > Regards,
>> > Krzysztof Chmielewski
>>
>

Reply via email to