Hi Krzysztof,
yes you are correct if you use the new FileSource:
* Please note that file blocks are only exposed by some file
systems, such as HDFS. File systems
* that do not expose block information will not create multiple file
splits per file, but keep the
* files as one source split.
For o
Hi Arvid,
thank you for your response.
I did a little bit more digging and analyzing and I noticed one thing,
Please correct me if I'm wrong.
Whether the Parquet file will be read in parallel in fact depends on
underlying file system.
If the file system supports file blocks then we will have spli
Yes, Parquet files can be read in splits (=in parallel). Which enumerator
is used is determined here [1].
[1]
https://github.com/apache/flink/blob/master/flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/ParquetVectorizedInputFormat.java#L170-L170
On Fri, Dec 10, 2021 at
Hi Roman,
Thank you.
I'm familiar with FLIP-27 and I was analyzing the new File Source.
>From there I saw that there are two FileEnumerators -> one that allows for
file split and other that does not. BlockSplittingRecursiveEnumerator
and NonSplittingRecursiveEnumerator.
I was wondering if BlockS
Hi,
Yes, file source does support DoP > 1.
And in general, a single file can be read in parallel after FLIP-27.
However, parallel reading of a single Parquet file is currently not
supported AFAIK.
Maybe Arvid or Fabian could shed more light here.
Regards,
Roman
On Thu, Dec 9, 2021 at 12:03 PM K
Hi,
can I have a File DataStream Source that will work with Parquet Format and
have parallelism level higher than one?
Is it possible to read Parquet file in chunks by multiple threads?
Regards,
Krzysztof Chmielewski