Was it a conscious decision that HybridSource only accept Sources, and does not allow mapping functions applied to them before combining them?
On Tue, Jul 4, 2023, 23:53 Ken Krugler <kkrugler_li...@transpac.com> wrote: > Hi Oscar, > > Couldn’t you have both the Kafka and File sources return an Either<POJO > from CSV File, Protobuf from Kafka>, and then (after the HybridSource) use > a MapFunction to convert to the unified/correct type? > > — Ken > > > On Jul 4, 2023, at 12:13 PM, Oscar Perez via user <user@flink.apache.org> > wrote: > > Hei, > 1) We populate state based on this CSV data and do business logic based on > this state and events coming from other unrelated streams. > 2) We are using low level process function in order to process this future > hybrid source > > Regardless of the aforementioned points, please note that the main > challenge is to combine in a hybridsource CSV and kafka topic that return > different datatypes so I dont know how my answers relate to the original > problem tbh. Regards, > Oscar > > On Tue, 4 Jul 2023 at 20:53, Alexander Fedulov < > alexander.fedu...@gmail.com> wrote: > >> @Oscar >> 1. How do you plan to use that CSV data? Is it needed for lookup from the >> "main" stream? >> 2. Which API are you using? DataStream/SQL/Table or low level >> ProcessFunction? >> >> Best, >> Alex >> >> >> On Tue, 4 Jul 2023 at 11:14, Oscar Perez via user <user@flink.apache.org> >> wrote: >> >>> ok, but is it? As I said, both sources have different data types. In the >>> example here: >>> >>> >>> https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/datastream/hybridsource/ >>> >>> We are using both sources as returning string but in our case, one >>> source would return a protobuf event while the other would return a pojo. >>> How can we make the 2 sources share the same datatype so that we can >>> successfully use hybrid source? >>> >>> Regards, >>> Oscar >>> >>> On Tue, 4 Jul 2023 at 12:04, Alexey Novakov <ale...@ververica.com> >>> wrote: >>> >>>> Hi Oscar, >>>> >>>> You could use connected streams and put your file into a special Kafka >>>> topic before starting such a job: >>>> https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/dev/datastream/operators/overview/#connect >>>> But this may require more work and the event ordering (which is >>>> shuffled) in the connected streams is probably not what you are looking >>>> for. >>>> >>>> I think HybridSource is the right solution. >>>> >>>> Best regards, >>>> Alexey >>>> >>>> On Mon, Jul 3, 2023 at 3:44 PM Oscar Perez via user < >>>> user@flink.apache.org> wrote: >>>> >>>>> Hei, We want to bootstrap some data from a CSV file before reading >>>>> from a kafka topic that has a retention period of 7 days. >>>>> >>>>> We believe the best tool for that would be the HybridSource but the >>>>> problem we are facing is that both datasources are of different nature. >>>>> The >>>>> KafkaSource returns a protobuf event while the CSV is a POJO with just 3 >>>>> fields. >>>>> >>>>> We could hack the kafkasource implementation and then in the >>>>> valuedeserializer do the mapping from protobuf to the CSV POJO but that >>>>> seems rather hackish. Is there a way more elegant to unify both datatypes >>>>> from both sources using Hybrid Source? >>>>> >>>>> thanks >>>>> Oscar >>>>> >>>> > -------------------------- > Ken Krugler > http://www.scaleunlimited.com > Custom big data solutions > Flink, Pinot, Solr, Elasticsearch > > > >