Hi Nirmalaya,

I think for your use case in particular it's enough to specify the reducer
that computes the average to have a parallelism of 1 by calling the
`setParallelism` API when you apply it. Keep in mind that you can still
enjoy a high level of parallelism up until the last operator by using a
combiner, which is basically a reducer that operates locally. This way, you
can compute 4 partial averages in parallel and then have the final reducer
actually reporting the average on the whole dataset.

I hope I have been helpful.

On Sat, Feb 13, 2016 at 7:03 PM, Nirmalya Sengupta <
sengupta.nirma...@gmail.com> wrote:

> Hello Flinksters,
>
> This is perhaps too trivial for most here in this forum, but I want to
> have my understanding clear.
>
> I want to find the average of temperatures coming in as a stream. The
> tuple as the following fields:
>
>
> probeID,readingTimeStamp,radiationLevel,photoSensor,humidity,ambientTemperature
>
> The 'readingTimeStamp' field provides the 'Event Time', and this provides
> the basis of the timeWindow(10) that I employ.
>
> I _keyBy_ the 'readingTimeStamp' and collect readings for next 10 seconds,
> and then I compute the average of the 'ambientTemperature'. After every 10
> seconds, I want to be able to find the average temperature *so far* (a
> single value).
>
> Because my application defaults to parellelism == 4 (my laptop is 4 core),
> my understanding is that just by using a combination of an appropriate
> RichXXXFunction() and saving State in the RuntimeContext, I may not get the
> correct result. This is because, depending the way the Keys are
> distributed, 4 _different_  averages will be produced. Is this
> understanding right? If so, is using a *Broadcast variable*, the solution?
>
> Please help me plug the gap in understanding, if any.
>
> -- Nirmalya
>
> --
> Software Technologist
> http://www.linkedin.com/in/nirmalyasengupta
> "If you have built castles in the air, your work need not be lost. That is
> where they should be.
> Now put the foundation under them."
>



-- 
BR,
Stefano Baghino

Software Engineer @ Radicalbit

Reply via email to