Hi Nirmalaya, I think for your use case in particular it's enough to specify the reducer that computes the average to have a parallelism of 1 by calling the `setParallelism` API when you apply it. Keep in mind that you can still enjoy a high level of parallelism up until the last operator by using a combiner, which is basically a reducer that operates locally. This way, you can compute 4 partial averages in parallel and then have the final reducer actually reporting the average on the whole dataset.
I hope I have been helpful. On Sat, Feb 13, 2016 at 7:03 PM, Nirmalya Sengupta < sengupta.nirma...@gmail.com> wrote: > Hello Flinksters, > > This is perhaps too trivial for most here in this forum, but I want to > have my understanding clear. > > I want to find the average of temperatures coming in as a stream. The > tuple as the following fields: > > > probeID,readingTimeStamp,radiationLevel,photoSensor,humidity,ambientTemperature > > The 'readingTimeStamp' field provides the 'Event Time', and this provides > the basis of the timeWindow(10) that I employ. > > I _keyBy_ the 'readingTimeStamp' and collect readings for next 10 seconds, > and then I compute the average of the 'ambientTemperature'. After every 10 > seconds, I want to be able to find the average temperature *so far* (a > single value). > > Because my application defaults to parellelism == 4 (my laptop is 4 core), > my understanding is that just by using a combination of an appropriate > RichXXXFunction() and saving State in the RuntimeContext, I may not get the > correct result. This is because, depending the way the Keys are > distributed, 4 _different_ averages will be produced. Is this > understanding right? If so, is using a *Broadcast variable*, the solution? > > Please help me plug the gap in understanding, if any. > > -- Nirmalya > > -- > Software Technologist > http://www.linkedin.com/in/nirmalyasengupta > "If you have built castles in the air, your work need not be lost. That is > where they should be. > Now put the foundation under them." > -- BR, Stefano Baghino Software Engineer @ Radicalbit