TD thank you for your reply.

I agree on data store requirement. I am using HBase as an underlying store.

So for every batch interval of say 10 seconds

- Calculate the time dimension ( minutes, hours, day, week, month and quarter ) 
along with other dimensions and metrics
- Update relevant base table at each batch interval for relevant metrics for a 
given set of dimensions.

Only caveat I see is I’ll have to update each of the different roll up table 
for each batch window.

Is this a valid approach for calculating time series aggregation?

Regards
SM

For minutes level aggregates I have set up a streaming window say 10 seconds 
and storing minutes level aggregates across multiple dimension in HBase at 
every window interval. 

> On 18-Nov-2015, at 7:45 AM, Tathagata Das <[email protected]> wrote:
> 
> For this sort of long term aggregations you should use a dedicated data 
> storage systems. Like a database, or a key-value store. Spark Streaming would 
> just aggregate and push the necessary data to the data store. 
> 
> TD
> 
> On Sat, Nov 14, 2015 at 9:32 PM, Sandip Mehta <[email protected] 
> <mailto:[email protected]>> wrote:
> Hi,
> 
> I am working on requirement of calculating real time metrics and building 
> prototype  on Spark streaming. I need to build aggregate at Seconds, Minutes, 
> Hours and Day level.
> 
> I am not sure whether I should calculate all these aggregates as  different 
> Windowed function on input DStream or shall I use updateStateByKey function 
> for the same. If I have to use updateStateByKey for these time series 
> aggregation, how can I remove keys from the state after different time lapsed?
> 
> Please suggest.
> 
> Regards
> SM
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected] 
> <mailto:[email protected]>
> For additional commands, e-mail: [email protected] 
> <mailto:[email protected]>
> 
> 

Reply via email to