TD thank you for your reply. I agree on data store requirement. I am using HBase as an underlying store.
So for every batch interval of say 10 seconds - Calculate the time dimension ( minutes, hours, day, week, month and quarter ) along with other dimensions and metrics - Update relevant base table at each batch interval for relevant metrics for a given set of dimensions. Only caveat I see is I’ll have to update each of the different roll up table for each batch window. Is this a valid approach for calculating time series aggregation? Regards SM For minutes level aggregates I have set up a streaming window say 10 seconds and storing minutes level aggregates across multiple dimension in HBase at every window interval. > On 18-Nov-2015, at 7:45 AM, Tathagata Das <[email protected]> wrote: > > For this sort of long term aggregations you should use a dedicated data > storage systems. Like a database, or a key-value store. Spark Streaming would > just aggregate and push the necessary data to the data store. > > TD > > On Sat, Nov 14, 2015 at 9:32 PM, Sandip Mehta <[email protected] > <mailto:[email protected]>> wrote: > Hi, > > I am working on requirement of calculating real time metrics and building > prototype on Spark streaming. I need to build aggregate at Seconds, Minutes, > Hours and Day level. > > I am not sure whether I should calculate all these aggregates as different > Windowed function on input DStream or shall I use updateStateByKey function > for the same. If I have to use updateStateByKey for these time series > aggregation, how can I remove keys from the state after different time lapsed? > > Please suggest. > > Regards > SM > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > <mailto:[email protected]> > For additional commands, e-mail: [email protected] > <mailto:[email protected]> > >
