I read it more carefully, and window() might actually work for some other
stuff like logs. (assuming I can have multiple windows with entirely
different attributes on a single stream..)

Thanks for that!


On Sun, Jun 8, 2014 at 11:11 PM, Jeremy Lee <unorthodox.engine...@gmail.com>
wrote:

> Yes.. but from what I understand that's a "sliding window" so for a window
> of (60) over (1) second DStreams, that would save the entire last minute of
> data once per second. That's more than I need.
>
> I think what I'm after is probably updateStateByKey... I want to mutate
> data structures (probably even graphs) as the stream comes in, but I also
> want that state to be persistent across restarts of the application, (Or
> parallel version of the app, if possible) So I'd have to save that
> structure occasionally and reload it as the "primer" on the next run.
>
> I was almost going to use HBase or Hive, but they seem to have been
> deprecated in 1.0.0? Or just late to the party?
>
> Also, I've been having trouble deleting hadoop directories.. the old "two
> line" examples don't seem to work anymore. I actually managed to fill up
> the worker instances (I gave them tiny EBS) and I think I crashed them.
>
>
>
> On Sat, Jun 7, 2014 at 10:23 PM, Gino Bustelo <lbust...@gmail.com> wrote:
>
>> Have you thought of using window?
>>
>> Gino B.
>>
>> > On Jun 6, 2014, at 11:49 PM, Jeremy Lee <unorthodox.engine...@gmail.com>
>> wrote:
>> >
>> >
>> > It's going well enough that this is a "how should I in 1.0.0" rather
>> than "how do i" question.
>> >
>> > So I've got data coming in via Streaming (twitters) and I want to
>> archive/log it all. It seems a bit wasteful to generate a new HDFS file for
>> each DStream, but also I want to guard against data loss from crashes,
>> >
>> > I suppose what I want is to let things build up into "superbatches"
>> over a few minutes, and then serialize those to parquet files, or similar?
>> Or do i?
>> >
>> > Do I count-down the number of DStreams, or does Spark have a preferred
>> way of scheduling cron events?
>> >
>> > What's the best practise for keeping persistent data for a streaming
>> app? (Across restarts) And to clean up on termination?
>> >
>> >
>> > --
>> > Jeremy Lee  BCompSci(Hons)
>> >   The Unorthodox Engineers
>>
>
>
>
> --
> Jeremy Lee  BCompSci(Hons)
>   The Unorthodox Engineers
>



-- 
Jeremy Lee  BCompSci(Hons)
  The Unorthodox Engineers

Reply via email to