Have you thought of using window? Gino B.
> On Jun 6, 2014, at 11:49 PM, Jeremy Lee <unorthodox.engine...@gmail.com> > wrote: > > > It's going well enough that this is a "how should I in 1.0.0" rather than > "how do i" question. > > So I've got data coming in via Streaming (twitters) and I want to archive/log > it all. It seems a bit wasteful to generate a new HDFS file for each DStream, > but also I want to guard against data loss from crashes, > > I suppose what I want is to let things build up into "superbatches" over a > few minutes, and then serialize those to parquet files, or similar? Or do i? > > Do I count-down the number of DStreams, or does Spark have a preferred way of > scheduling cron events? > > What's the best practise for keeping persistent data for a streaming app? > (Across restarts) And to clean up on termination? > > > -- > Jeremy Lee BCompSci(Hons) > The Unorthodox Engineers