Re: Issue while trying to aggregate with a sliding window

2014-06-19 Thread Tathagata Das
zeroTime marks the time when the streaming job started, and the first batch of data is from zeroTime to zeroTime + slideDuration. The validity check of time - zeroTime) being multiple of slideDuration is to ensure that for a given dstream, it generates RDD at the right times. For example, say the b

Re: Issue while trying to aggregate with a sliding window

2014-06-18 Thread Hatch M
Ok that patch does fix the key lookup exception. However, curious about the time validity check..isValidTime ( https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/dstream/DStream.scala#L264 ) Why does (time - zerotime) have to be a multiple of slide dura

Re: Issue while trying to aggregate with a sliding window

2014-06-17 Thread Hatch M
Thanks! Will try to get the fix and retest. On Tue, Jun 17, 2014 at 5:30 PM, onpoq l wrote: > There is a bug: > > https://github.com/apache/spark/pull/961#issuecomment-45125185 > > > On Tue, Jun 17, 2014 at 8:19 PM, Hatch M wrote: > > Trying to aggregate over a sliding window, playing with the

Re: Issue while trying to aggregate with a sliding window

2014-06-17 Thread onpoq l
There is a bug: https://github.com/apache/spark/pull/961#issuecomment-45125185 On Tue, Jun 17, 2014 at 8:19 PM, Hatch M wrote: > Trying to aggregate over a sliding window, playing with the slide duration. > Playing around with the slide interval I can see the aggregation works but > mostly fail