Re: Multiple operations on same DStream in Spark Streaming

2015-07-28 Thread Dean Wampler
Is this average supposed to be across all partitions? If so, it will require some one of the reduce operations in every batch interval. If that's too slow for the data rate, I would investigate using PairDStreamFunctions.updateStateByKey to compute the sum + count of the 2nd integers, per 1st integ

Re: Multiple operations on same DStream in Spark Streaming

2015-07-28 Thread Akhil Das
One approach would be to store the batch data in an intermediate storage (like HBase/MySQL or even in zookeeper), and inside your filter function you just go and read the previous value from this storage and do whatever operation that you are supposed to do. Thanks Best Regards On Sun, Jul 26, 20