Hi,
I have a system which does streaming doing analysis over a long period of time. 
For example a sliding window of 24 hours every 15 minutes.
I have a batch process I need to convert to this streaming.
I am wondering how to do so efficiently.

I am currently building the streaming process so I can use DStream or create 
dataframe for each time period manually.
I know that if I have a groupby, spark would cache the groupby and therefore 
only the new time period would be calculated.
My problem, however, is handling window functions.

Consider an example where I have a window function that counts the number of 
failed logins before a successful one in the last 2 hours. How would I convert 
it to streaming so it wouldn't be recalculated every 15 minutes from scratch?

Thanks,
                Assaf.


Reply via email to