Re: Synchronisation between different sources in flink

Akshay Agarwal Fri, 15 Oct 2021 05:21:31 -0700

Hey,

The use case is as follows:
Kafka is the source for our frontend/impression events like any user
activity and what we are doing is enriching that stream some values from
the backend which is the source of certain slowly changing properties(that
changes/updated over time) values, it gets dumped as files from the backend
in regular intervals like every hour, quarter of an hour.


*Akshay Agarwal*

On Fri, Oct 15, 2021 at 3:48 PM Martijn Visser <mart...@ververica.com>
wrote:

> Hi,
>
> Can you elaborate a bit more on what your use case is and what you're
> trying to achieve?
>
> Best regards,
>
> Martijn
>
> On Fri, 15 Oct 2021 at 11:25, Akshay Agarwal <akshay.agar...@grofers.com>
> wrote:
>
>> Hey,
>>
>> I have few doubts about HybridSource, it says:
>>
>>> To arrange multiple sources in a HybridSource, all sources except the
>>> last one need to be bounded. Therefore, the sources typically need to be
>>> assigned a start and end position.
>>
>> Both sources that I use in the job are unbounded, (file source and Kafka
>> source both get continuous updates). Further schema of both sources is
>> different since they give different data points. So wanted to clarify that
>> further you are suggesting converting Kafka source to hybrid source(file
>> source and Kafka source) and update state based on that and keep file
>> source as it is.
>>
>> *Akshay Agarwal*
>>
>>
>> On Fri, Oct 15, 2021 at 2:06 PM Akshay Agarwal <
>> akshay.agar...@grofers.com> wrote:
>>
>>> Nope, I haven't gone through that, will take a look, thanks for the
>>> prompt reply.
>>>
>>> *Akshay Agarwal*
>>>
>>>
>>>
>>> On Fri, Oct 15, 2021 at 2:00 PM Martijn Visser <mart...@ververica.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> Have you checked out the Hybrid Source? [1]
>>>>
>>>> Thanks, Martijn
>>>>
>>>> [1]
>>>> https://ci.apache.org/projects/flink/flink-docs-master/docs/connectors/datastream/hybridsource/
>>>>
>>>> On Fri, 15 Oct 2021 at 10:22, Akshay Agarwal <
>>>> akshay.agar...@grofers.com> wrote:
>>>>
>>>>> Hi Team,
>>>>>
>>>>> I have a streaming job that creates an enriched stream from streams of
>>>>> 2 different sources like one from files in object storage and another one
>>>>> Kafka. But since both of these are from different sources, events from 
>>>>> them
>>>>> are read at different moments based on their performance. For example
>>>>> events from Kafka arrives first than from file storage events and even
>>>>> though I have used event watermarks, thus at the restart of jobs, enriched
>>>>> events are wrong and it gets corrected after file events start arriving. I
>>>>> tried to search for setting up synchronization between different sources 
>>>>> in
>>>>> flink but did not found any blog/material. It might be a noob question, 
>>>>> but
>>>>> if you guys have built something around this, could you let me know.
>>>>>
>>>>> Regards
>>>>> *Akshay Agarwal*
>>>>>
>>>>> [image: https://grofers.com] <https://grofers.com>
>>>>
>>>>
>> [image: https://grofers.com] <https://grofers.com>
>
>

-- 
 <https://grofers.com>

Re: Synchronisation between different sources in flink

Reply via email to