Hey, The use case is as follows: Kafka is the source for our frontend/impression events like any user activity and what we are doing is enriching that stream some values from the backend which is the source of certain slowly changing properties(that changes/updated over time) values, it gets dumped as files from the backend in regular intervals like every hour, quarter of an hour.
*Akshay Agarwal* On Fri, Oct 15, 2021 at 3:48 PM Martijn Visser <mart...@ververica.com> wrote: > Hi, > > Can you elaborate a bit more on what your use case is and what you're > trying to achieve? > > Best regards, > > Martijn > > On Fri, 15 Oct 2021 at 11:25, Akshay Agarwal <akshay.agar...@grofers.com> > wrote: > >> Hey, >> >> I have few doubts about HybridSource, it says: >> >>> To arrange multiple sources in a HybridSource, all sources except the >>> last one need to be bounded. Therefore, the sources typically need to be >>> assigned a start and end position. >> >> Both sources that I use in the job are unbounded, (file source and Kafka >> source both get continuous updates). Further schema of both sources is >> different since they give different data points. So wanted to clarify that >> further you are suggesting converting Kafka source to hybrid source(file >> source and Kafka source) and update state based on that and keep file >> source as it is. >> >> *Akshay Agarwal* >> >> >> On Fri, Oct 15, 2021 at 2:06 PM Akshay Agarwal < >> akshay.agar...@grofers.com> wrote: >> >>> Nope, I haven't gone through that, will take a look, thanks for the >>> prompt reply. >>> >>> *Akshay Agarwal* >>> >>> >>> >>> On Fri, Oct 15, 2021 at 2:00 PM Martijn Visser <mart...@ververica.com> >>> wrote: >>> >>>> Hi, >>>> >>>> Have you checked out the Hybrid Source? [1] >>>> >>>> Thanks, Martijn >>>> >>>> [1] >>>> https://ci.apache.org/projects/flink/flink-docs-master/docs/connectors/datastream/hybridsource/ >>>> >>>> On Fri, 15 Oct 2021 at 10:22, Akshay Agarwal < >>>> akshay.agar...@grofers.com> wrote: >>>> >>>>> Hi Team, >>>>> >>>>> I have a streaming job that creates an enriched stream from streams of >>>>> 2 different sources like one from files in object storage and another one >>>>> Kafka. But since both of these are from different sources, events from >>>>> them >>>>> are read at different moments based on their performance. For example >>>>> events from Kafka arrives first than from file storage events and even >>>>> though I have used event watermarks, thus at the restart of jobs, enriched >>>>> events are wrong and it gets corrected after file events start arriving. I >>>>> tried to search for setting up synchronization between different sources >>>>> in >>>>> flink but did not found any blog/material. It might be a noob question, >>>>> but >>>>> if you guys have built something around this, could you let me know. >>>>> >>>>> Regards >>>>> *Akshay Agarwal* >>>>> >>>>> [image: https://grofers.com] <https://grofers.com> >>>> >>>> >> [image: https://grofers.com] <https://grofers.com> > > -- <https://grofers.com>