So say I want to calculate top K users visiting a page in the past 2 hours
updated every 5 mins.

so here I want to maintain something like this

Page_01 => {user_01:32, user_02:3, user_03:7...}
...

Basically a count of number of times a user visited a page. Here my key is
page name/id and state is the hashmap.

Now in updateStateByKey I get the previous state and new events coming *in*
the window. Is there a way to also get the events going *out* of the
window? This was I can incrementally update the state over a rolling window.

What is the efficient way to do it in spark streaming?

Thanks
Ashish

Reply via email to