Thanks Balaji. Do you mean you spill the non-matching records after 5 minutes into redis? Does flink give you control on which records is not matching in the current window such that you can copy into a long-term storage?
On Wed, Apr 13, 2016 at 11:20 PM, Balaji Rajagopalan < balaji.rajagopa...@olacabs.com> wrote: > You can implement join in flink (which is a inner join) the below > mentioned pseudo code . The below join is for a 5 minute interval, yes will > be some corners cases when the data coming after 5 minutes will be missed > out in the join window, I actually had solved this problem but storing some > data in redis and wrote correlation logic to take care of the corner cases > that were missed out in the join window. > > val output: DataStream[(OutputData)] = > stream1.join(stream2).where(_.key1).equalTo(_.key2). > window(TumblingEventTimeWindows.of(Time.of(5, TimeUnit.MINUTE))).apply(new > SomeJoinFunction) > > > On Thu, Apr 14, 2016 at 10:02 AM, Henry Cai <h...@pinterest.com> wrote: > >> Hi, >> >> We are evaluating different streaming platforms. For a typical join >> between two streams >> >> select a.*, b.* >> FROM a, b >> ON a.id == b.id >> >> How does flink implement the join? The matching record from either >> stream can come late, we consider it's a valid join as long as the event >> time for record a and b are in the same day. >> >> I think some streaming platform (e.g. google data flow) will store the >> records from both streams in a K/V lookup store and later do the lookup. >> Is this how flink implement the streaming join? >> >> If we need to store all the records in a state store, that's going to be >> a lots of records for a day. >> >> >