Yes, Gyula, that should work. I would make the random key across a range of 10 * parallelism.
On Fri, Feb 26, 2016 at 7:16 PM, Gyula Fóra <gyula.f...@gmail.com> wrote: > Hey, > > I am wondering if the following code will result in identical but more > efficient (parallel): > > input.keyBy(assignRandomKey).window(Time.seconds(10) > ).reduce(count).timeWindowAll(Time.seconds(10)).reduce(count) > > Effectively just assigning random keys to do the preaggregation and then > do a window on the pre-aggregated values. I wonder if this actually leads > to correct results or how does it interplay with the time semantics. > > Cheers, > Gyula > > Stephan Ewen <se...@apache.org> ezt írta (időpont: 2016. febr. 26., P, > 19:10): > >> True, at this point it does not pre-aggregate in parallel, that is >> actually a feature on the list but not yet added... >> >> On Fri, Feb 26, 2016 at 7:08 PM, Saiph Kappa <saiph.ka...@gmail.com> >> wrote: >> >>> That code will not run in parallel right? So, a map-reduce task would >>> yield better performance no? >>> >>> >>> >>> On Fri, Feb 26, 2016 at 6:06 PM, Stephan Ewen <se...@apache.org> wrote: >>> >>>> Then go for: >>>> >>>> input.timeWindowAll(Time.seconds(10)).fold(0, new >>>> FoldFunction<Tuple2<Integer, Integer>, Integer>() { @Override public >>>> Integer fold(Integer integer, Tuple2<Integer, Integer> o) throws Exception >>>> { return integer + 1; } }); >>>> >>>> Try to explore the API a bit, most things should be quite intuitive. >>>> There are also some docs: >>>> https://ci.apache.org/projects/flink/flink-docs-release-0.10/apis/streaming_guide.html#windows-on-unkeyed-data-streams >>>> >>>> On Fri, Feb 26, 2016 at 4:07 PM, Saiph Kappa <saiph.ka...@gmail.com> >>>> wrote: >>>> >>>>> Why the ".keyBy"? I don't want to count tuples by Key. I simply want >>>>> to count all tuples that are contained in a window. >>>>> >>>>> On Fri, Feb 26, 2016 at 9:14 AM, Till Rohrmann <trohrm...@apache.org> >>>>> wrote: >>>>> >>>>>> Hi Saiph, >>>>>> >>>>>> you can do it the following way: >>>>>> >>>>>> input.keyBy(0).timeWindow(Time.seconds(10)).fold(0, new >>>>>> FoldFunction<Tuple2<Integer, Integer>, Integer>() { >>>>>> @Override >>>>>> public Integer fold(Integer integer, Tuple2<Integer, Integer> o) >>>>>> throws Exception { >>>>>> return integer + 1; >>>>>> } >>>>>> }); >>>>>> >>>>>> Cheers, >>>>>> Till >>>>>> >>>>>> >>>>>> On Thu, Feb 25, 2016 at 7:58 PM, Saiph Kappa <saiph.ka...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> In Flink Stream what's the best way of counting the number of tuples >>>>>>> within a window of 10 seconds? Using a map-reduce task? Asking because >>>>>>> in >>>>>>> spark there is the method rawStream.countByWindow(Seconds(x)). >>>>>>> >>>>>>> Thanks. >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>> >>