RE: reduceByKeyAndWindow, but using log timestamps instead of clock seconds

Shao, Saisai Wed, 28 Jan 2015 23:54:27 -0800

That's definitely a good supplement to the current Spark Streaming, I've heard 
many guys want to process the data using log time. Looking forward to the code.


Thanks
Jerry

-----Original Message-----
From: Tathagata Das [mailto:tathagata.das1...@gmail.com] 
Sent: Thursday, January 29, 2015 10:33 AM
To: Tobias Pfeiffer
Cc: YaoPau; user
Subject: Re: reduceByKeyAndWindow, but using log timestamps instead of clock 
seconds

Ohhh nice! Would be great if you can share us some code soon. It is indeed a 
very complicated problem and there is probably no single solution that fits all 
usecases. So having one way of doing things would be a great reference. Looking 
forward to that!

On Wed, Jan 28, 2015 at 4:52 PM, Tobias Pfeiffer <t...@preferred.jp> wrote:
> Hi,
>
> On Thu, Jan 29, 2015 at 1:54 AM, YaoPau <jonrgr...@gmail.com> wrote:
>>
>> My thinking is to maintain state in an RDD and update it an persist 
>> it with each 2-second pass, but this also seems like it could get 
>> messy.  Any thoughts or examples that might help me?
>
>
> I have just implemented some timestamp-based windowing on DStreams 
> (can't share the code now, but will be published a couple of months 
> ahead), although with the assumption that items are in correct order. 
> The main challenge (rather technical) was to keep proper state across 
> RDD boundaries and to tell the state "you can mark this partial window 
> from the last interval as 'complete' now" without shuffling too much 
> data around. For example, if there are some empty intervals, you don't 
> know when the next item to go into the partial window will arrive, or 
> if there will be one at all. I guess if you want to have out-of-order 
> tolerance, that will become even trickier, as you need to define and 
> think about some timeout for partial windows in your state...
>
> Tobias
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional 
commands, e-mail: user-h...@spark.apache.org

RE: reduceByKeyAndWindow, but using log timestamps instead of clock seconds

Reply via email to