Is this average supposed to be across all partitions? If so, it will
require some one of the reduce operations in every batch interval. If
that's too slow for the data rate, I would investigate using
PairDStreamFunctions.updateStateByKey to compute the sum + count of the 2nd
integers, per 1st integ
One approach would be to store the batch data in an intermediate storage
(like HBase/MySQL or even in zookeeper), and inside your filter function
you just go and read the previous value from this storage and do whatever
operation that you are supposed to do.
Thanks
Best Regards
On Sun, Jul 26, 20