Re: Hybrid Source with Parquet Files from GCS + KafkaSource

2021-12-15 Thread Fabian Paul
Hi Megh, Flink offers the ParquetVectorizedInputFormat which is already heavily optimized. Unfortunately, you need to need to implement some of the methods depending on your type. In general, the BulkFormat gives you more control and allows more optimizations but is harder to implement. Best, Fab

Re: Hybrid Source with Parquet Files from GCS + KafkaSource

2021-12-14 Thread Meghajit Mazumdar
Hi, Thanks. I was able to get this working. Had to use recordFileFormat though. Is there a performance difference between FileRecordFormat and BulkFormat

Re: Hybrid Source with Parquet Files from GCS + KafkaSource

2021-12-10 Thread Roman Khachatryan
Hi, Have you tried constructing a Hybrid source from a File source created with FileSource.forBulkFileFormat [1] and "gs://bucket" scheme [2] directly? [1] https://nightlies.apache.org/flink/flink-docs-master/api/java/org/apache/flink/connector/file/src/FileSource.html#forBulkFileFormat-org.apach