Re: streaming join implementation

Balaji Rajagopalan Wed, 13 Apr 2016 23:55:46 -0700

Let me give you specific example, say stream1 event1 happened within your
window 0-5 min with key1, and event2 on stream2 with key2 which could have
matched with key1 happened at 5:01 outside the join window, so now you will
have to co-relate the event2 on stream2 with the event1 with stream1 which
has happened on the previous window, this was the corner case I mentioned
before. I am not aware if flink can solve this problem for you, that would
be nice, instead of solving this in application.


On Thu, Apr 14, 2016 at 12:10 PM, Henry Cai <h...@pinterest.com> wrote:

> Thanks Balaji.  Do you mean you spill the non-matching records after 5
> minutes into redis?  Does flink give you control on which records is not
> matching in the current window such that you can copy into a long-term
> storage?
>
>
>
> On Wed, Apr 13, 2016 at 11:20 PM, Balaji Rajagopalan <
> balaji.rajagopa...@olacabs.com> wrote:
>
>> You can implement join in flink (which is a inner join) the below
>> mentioned pseudo code . The below join is for a 5 minute interval, yes will
>> be some corners cases when the data coming after 5 minutes will be  missed
>> out in the join window, I actually had solved this problem but storing some
>> data in redis and wrote correlation logic to take care of the corner cases
>> that were missed out in the join  window.
>>
>> val output: DataStream[(OutputData)] = 
>> stream1.join(stream2).where(_.key1).equalTo(_.key2).
>>   window(TumblingEventTimeWindows.of(Time.of(5, TimeUnit.MINUTE))).apply(new 
>> SomeJoinFunction)
>>
>>
>> On Thu, Apr 14, 2016 at 10:02 AM, Henry Cai <h...@pinterest.com> wrote:
>>
>>> Hi,
>>>
>>> We are evaluating different streaming platforms.  For a typical join
>>> between two streams
>>>
>>> select a.*, b.*
>>> FROM a, b
>>> ON a.id == b.id
>>>
>>> How does flink implement the join?  The matching record from either
>>> stream can come late, we consider it's a valid join as long as the event
>>> time for record a and b are in the same day.
>>>
>>> I think some streaming platform (e.g. google data flow) will store the
>>> records from both streams in a K/V lookup store and later do the lookup.
>>> Is this how flink implement the streaming join?
>>>
>>> If we need to store all the records in a state store, that's going to be
>>> a lots of records for a day.
>>>
>>>
>>
>

Re: streaming join implementation

Reply via email to