Splitting in Stream Formats for File Source

Chirag Dewan via user Wed, 16 Aug 2023 21:00:21 -0700

Hi,I am trying to collect files from HDFS in my DataStream job. I need to 
collect two types of files - CSV and Parquet. 
I understand that Flink supports both formats, but in Streaming mode, Flink 
doesnt support splitting these formats. Splitting is only supported in Table 
API.
I wanted to understand the thought process around this and why splitting is not 
supported in CSV and AvroParquet Stream formats? As far as my understanding 
goes, splitting would work fine with HDFS blocks and multiple blocks can be 
read in parallel. 
Maybe I am missing some fundamental aspect about this. 
Would like to understand more if someone can point me in the right 
direction.Thanks

Splitting in Stream Formats for File Source

Reply via email to