Hi Nirmalaya, my reply was based on me misreading your original post, thinking you had a batch of data, not a stream. I see that the apply method can also take a reducer the pre-aggregates your data before passing it to the window function. I suspect that pre-aggregation runs locally just like a combiner would, but I'm really not sure about it. We should have more feedback on this regard.
On Tue, Feb 16, 2016 at 2:19 AM, Nirmalya Sengupta < sengupta.nirma...@gmail.com> wrote: > Hello Stefano <stefano.bagh...@radicalbit.io> > > Sorry for the late reply. Many thanks for taking effort to write and share > an example code snippet. > > I have been playing with the countWindow behaviour for some weeks now and > I am generally aware of the functionality of countWindowAll(). For my > useCase, where I _have to observe_ the entire stream as it founts in, using > countWindowAll() is probably the most obvious solution. This is what you > recommend too. However, because this is going to use 1 thread only (or 1 > node only in a cluster), I was thinking about ways to make use of the > 'distributedness' of the framework. Hence, my question. > > Your reply leads to me read and think a bit more. If I have to use > parallelism to achieve what I want to achieve, I think managing a > ValueState of my own is possibly the solution. If you have any other > thoughts, please share. > > From your earlier response: '... you can still enjoy a high level of > parallelism up until the last operator by using a combiner, which is > basically a reducer that operates locally ...'. Could you elaborate this a > bit, whenever you have time? > > -- Nirmalya > > -- > Software Technologist > http://www.linkedin.com/in/nirmalyasengupta > "If you have built castles in the air, your work need not be lost. That is > where they should be. > Now put the foundation under them." > -- BR, Stefano Baghino Software Engineer @ Radicalbit