Re: Parallelize an incoming stream into 5 streams with the same data

2018-11-06 Thread Vijay Balakrishnan
Cool, thanks! Hequn. I will try that approach. Vijay On Thu, Nov 1, 2018 at 8:18 PM Hequn Cheng wrote: > Hi Vijay, > > > I was hoping to groupBy(key._1,key._2) etc and then do a tumblingWindow > operation on the KeyedStream and then perform group operation on the > resultant set to get total co

Re: Parallelize an incoming stream into 5 streams with the same data

2018-11-01 Thread Hequn Cheng
Hi Vijay, > I was hoping to groupBy(key._1,key._2) etc and then do a tumblingWindow operation on the KeyedStream and then perform group operation on the resultant set to get total count etc. >From your description, I think you can perform a TumblingEventTimeWindow first, something looks like: >

Re: Parallelize an incoming stream into 5 streams with the same data

2018-11-01 Thread Vijay Balakrishnan
Thanks,Hequn. If I have to do a TumblingWindow operation like: .timeWindowAll(org.apache.flink.streaming.api.windowing.time.Time.of(FIVE, TimeUnit.SECONDS)) I am not able to do that on the output of keyBy(..) which is a KeyedStream. I was hoping to groupBy(key._1,key._2) etc and then do a tumbli

Re: Parallelize an incoming stream into 5 streams with the same data

2018-10-25 Thread Hequn Cheng
Hi Vijay, Option 1 is the right answer. `keyBy1` and `keyBy2` contain all data in `inputStream`. While option 2 replicate all data to each task and option 3 split data into smaller groups without duplication. Best, Hequn On Fri, Oct 26, 2018 at 7:01 AM Vijay Balakrishnan wrote: > Hi, > I need

Parallelize an incoming stream into 5 streams with the same data

2018-10-25 Thread Vijay Balakrishnan
Hi, I need to broadcast/parallelize an incoming stream(inputStream) into 5 streams with the same data. Each stream is keyed by different keys to do various grouping operations on the set. Do I just use inputStream.keyBy(5 diff keys) and then just use the DataStream to perform windowing/grouping op