Have you thought of using window?

Gino B.

> On Jun 6, 2014, at 11:49 PM, Jeremy Lee <unorthodox.engine...@gmail.com> 
> wrote:
> 
> 
> It's going well enough that this is a "how should I in 1.0.0" rather than 
> "how do i" question.
> 
> So I've got data coming in via Streaming (twitters) and I want to archive/log 
> it all. It seems a bit wasteful to generate a new HDFS file for each DStream, 
> but also I want to guard against data loss from crashes,
> 
> I suppose what I want is to let things build up into "superbatches" over a 
> few minutes, and then serialize those to parquet files, or similar? Or do i?
> 
> Do I count-down the number of DStreams, or does Spark have a preferred way of 
> scheduling cron events?
> 
> What's the best practise for keeping persistent data for a streaming app? 
> (Across restarts) And to clean up on termination?
> 
> 
> -- 
> Jeremy Lee  BCompSci(Hons)
>   The Unorthodox Engineers

Reply via email to