Hi Megh, Flink offers the ParquetVectorizedInputFormat which is already heavily optimized. Unfortunately, you need to need to implement some of the methods depending on your type. In general, the BulkFormat gives you more control and allows more optimizations but is harder to implement.
Best, Fabian On Wed, Dec 15, 2021 at 6:21 AM Meghajit Mazumdar <meghajit.mazum...@gojek.com> wrote: > > Hi, > > Thanks. I was able to get this working. Had to use recordFileFormat though. > > Is there a performance difference between FileRecordFormat and BulkFormat ? > > Thanks, > Megh > > On Fri, Dec 10, 2021 at 2:48 PM Roman Khachatryan <ro...@apache.org> wrote: >> >> Hi, >> >> Have you tried constructing a Hybrid source from a File source created >> with FileSource.forBulkFileFormat [1] and "gs://bucket" scheme [2] >> directly? >> >> [1] >> https://nightlies.apache.org/flink/flink-docs-master/api/java/org/apache/flink/connector/file/src/FileSource.html#forBulkFileFormat-org.apache.flink.connector.file.src.reader.BulkFormat-org.apache.flink.core.fs.Path...- >> [2] >> https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/filesystems/gcs/ >> >> Regards, >> Roman >> >> On Thu, Dec 9, 2021 at 1:04 PM Meghajit Mazumdar >> <meghajit.mazum...@gojek.com> wrote: >> > >> > Hello, >> > >> > We have a requirement as follows: >> > >> > We want to stream events from 2 sources: Parquet files stored in a GCS >> > Bucket, and a Kafka topic. >> > With the release of Hybrid Source in Flink 1.14, we were able to construct >> > a Hybrid Source which produces events from two sources: a FileSource which >> > reads data from a locally saved Parquet File, and a KafkaSource consuming >> > events from a remote Kafka broker. >> > >> > I was wondering if instead of using a local Parquet file, whether it is >> > possible to directly stream the file from a GCS bucket and construct a >> > File Source out of it at runtime ? The Parquet Files are quite big and >> > it's a bit expensive to download. >> > >> > Does Flink have such a functionality ? Or, has anyone come across such a >> > use case previously ? Would greatly appreciate some help on this. >> > >> > Looking forward to hearing from you. >> > >> > Thanks, >> > Megh