Hi Oscar, Couldn’t you have both the Kafka and File sources return an Either<POJO from CSV File, Protobuf from Kafka>, and then (after the HybridSource) use a MapFunction to convert to the unified/correct type?
— Ken > On Jul 4, 2023, at 12:13 PM, Oscar Perez via user <user@flink.apache.org> > wrote: > > Hei, > 1) We populate state based on this CSV data and do business logic based on > this state and events coming from other unrelated streams. > 2) We are using low level process function in order to process this future > hybrid source > > Regardless of the aforementioned points, please note that the main challenge > is to combine in a hybridsource CSV and kafka topic that return different > datatypes so I dont know how my answers relate to the original problem tbh. > Regards, > Oscar > > On Tue, 4 Jul 2023 at 20:53, Alexander Fedulov <alexander.fedu...@gmail.com > <mailto:alexander.fedu...@gmail.com>> wrote: > @Oscar <> > 1. How do you plan to use that CSV data? Is it needed for lookup from the > "main" stream? > 2. Which API are you using? DataStream/SQL/Table or low level ProcessFunction? > > Best, > Alex > > <> > On Tue, 4 Jul 2023 at 11:14, Oscar Perez via user <user@flink.apache.org > <mailto:user@flink.apache.org>> wrote: > ok, but is it? As I said, both sources have different data types. In the > example here: > > https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/datastream/hybridsource/ > > <https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/datastream/hybridsource/> > > We are using both sources as returning string but in our case, one source > would return a protobuf event while the other would return a pojo. How can we > make the 2 sources share the same datatype so that we can successfully use > hybrid source? > > Regards, > Oscar > > On Tue, 4 Jul 2023 at 12:04, Alexey Novakov <ale...@ververica.com > <mailto:ale...@ververica.com>> wrote: > Hi Oscar, > > You could use connected streams and put your file into a special Kafka topic > before starting such a job: > https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/dev/datastream/operators/overview/#connect > > <https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/dev/datastream/operators/overview/#connect> > But this may require more work and the event ordering (which is shuffled) in > the connected streams is probably not what you are looking for. > > I think HybridSource is the right solution. > > Best regards, > Alexey > > On Mon, Jul 3, 2023 at 3:44 PM Oscar Perez via user <user@flink.apache.org > <mailto:user@flink.apache.org>> wrote: > Hei, We want to bootstrap some data from a CSV file before reading from a > kafka topic that has a retention period of 7 days. > > We believe the best tool for that would be the HybridSource but the problem > we are facing is that both datasources are of different nature. The > KafkaSource returns a protobuf event while the CSV is a POJO with just 3 > fields. > > We could hack the kafkasource implementation and then in the > valuedeserializer do the mapping from protobuf to the CSV POJO but that seems > rather hackish. Is there a way more elegant to unify both datatypes from both > sources using Hybrid Source? > > thanks > Oscar -------------------------- Ken Krugler http://www.scaleunlimited.com Custom big data solutions Flink, Pinot, Solr, Elasticsearch