Let me give you specific example, say stream1 event1 happened within your window 0-5 min with key1, and event2 on stream2 with key2 which could have matched with key1 happened at 5:01 outside the join window, so now you will have to co-relate the event2 on stream2 with the event1 with stream1 which has happened on the previous window, this was the corner case I mentioned before. I am not aware if flink can solve this problem for you, that would be nice, instead of solving this in application.
On Thu, Apr 14, 2016 at 12:10 PM, Henry Cai <h...@pinterest.com> wrote: > Thanks Balaji. Do you mean you spill the non-matching records after 5 > minutes into redis? Does flink give you control on which records is not > matching in the current window such that you can copy into a long-term > storage? > > > > On Wed, Apr 13, 2016 at 11:20 PM, Balaji Rajagopalan < > balaji.rajagopa...@olacabs.com> wrote: > >> You can implement join in flink (which is a inner join) the below >> mentioned pseudo code . The below join is for a 5 minute interval, yes will >> be some corners cases when the data coming after 5 minutes will be missed >> out in the join window, I actually had solved this problem but storing some >> data in redis and wrote correlation logic to take care of the corner cases >> that were missed out in the join window. >> >> val output: DataStream[(OutputData)] = >> stream1.join(stream2).where(_.key1).equalTo(_.key2). >> window(TumblingEventTimeWindows.of(Time.of(5, TimeUnit.MINUTE))).apply(new >> SomeJoinFunction) >> >> >> On Thu, Apr 14, 2016 at 10:02 AM, Henry Cai <h...@pinterest.com> wrote: >> >>> Hi, >>> >>> We are evaluating different streaming platforms. For a typical join >>> between two streams >>> >>> select a.*, b.* >>> FROM a, b >>> ON a.id == b.id >>> >>> How does flink implement the join? The matching record from either >>> stream can come late, we consider it's a valid join as long as the event >>> time for record a and b are in the same day. >>> >>> I think some streaming platform (e.g. google data flow) will store the >>> records from both streams in a K/V lookup store and later do the lookup. >>> Is this how flink implement the streaming join? >>> >>> If we need to store all the records in a state store, that's going to be >>> a lots of records for a day. >>> >>> >> >