Hi, Have you tried constructing a Hybrid source from a File source created with FileSource.forBulkFileFormat [1] and "gs://bucket" scheme [2] directly?
[1] https://nightlies.apache.org/flink/flink-docs-master/api/java/org/apache/flink/connector/file/src/FileSource.html#forBulkFileFormat-org.apache.flink.connector.file.src.reader.BulkFormat-org.apache.flink.core.fs.Path...- [2] https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/filesystems/gcs/ Regards, Roman On Thu, Dec 9, 2021 at 1:04 PM Meghajit Mazumdar <meghajit.mazum...@gojek.com> wrote: > > Hello, > > We have a requirement as follows: > > We want to stream events from 2 sources: Parquet files stored in a GCS > Bucket, and a Kafka topic. > With the release of Hybrid Source in Flink 1.14, we were able to construct a > Hybrid Source which produces events from two sources: a FileSource which > reads data from a locally saved Parquet File, and a KafkaSource consuming > events from a remote Kafka broker. > > I was wondering if instead of using a local Parquet file, whether it is > possible to directly stream the file from a GCS bucket and construct a File > Source out of it at runtime ? The Parquet Files are quite big and it's a bit > expensive to download. > > Does Flink have such a functionality ? Or, has anyone come across such a use > case previously ? Would greatly appreciate some help on this. > > Looking forward to hearing from you. > > Thanks, > Megh