Hello Flinksters,

This is perhaps too trivial for most here in this forum, but I want to have
my understanding clear.

I want to find the average of temperatures coming in as a stream. The tuple
as the following fields:

probeID,readingTimeStamp,radiationLevel,photoSensor,humidity,ambientTemperature

The 'readingTimeStamp' field provides the 'Event Time', and this provides
the basis of the timeWindow(10) that I employ.

I _keyBy_ the 'readingTimeStamp' and collect readings for next 10 seconds,
and then I compute the average of the 'ambientTemperature'. After every 10
seconds, I want to be able to find the average temperature *so far* (a
single value).

Because my application defaults to parellelism == 4 (my laptop is 4 core),
my understanding is that just by using a combination of an appropriate
RichXXXFunction() and saving State in the RuntimeContext, I may not get the
correct result. This is because, depending the way the Keys are
distributed, 4 _different_  averages will be produced. Is this
understanding right? If so, is using a *Broadcast variable*, the solution?

Please help me plug the gap in understanding, if any.

-- Nirmalya

-- 
Software Technologist
http://www.linkedin.com/in/nirmalyasengupta
"If you have built castles in the air, your work need not be lost. That is
where they should be.
Now put the foundation under them."

Reply via email to