zeroTime marks the time when the streaming job started, and the first batch
of data is from zeroTime to zeroTime + slideDuration. The validity check of
time - zeroTime) being multiple of slideDuration is to ensure that for a
given dstream, it generates RDD at the right times. For example, say the
b
Ok that patch does fix the key lookup exception. However, curious about the
time validity check..isValidTime (
https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/dstream/DStream.scala#L264
)
Why does (time - zerotime) have to be a multiple of slide dura
Thanks! Will try to get the fix and retest.
On Tue, Jun 17, 2014 at 5:30 PM, onpoq l wrote:
> There is a bug:
>
> https://github.com/apache/spark/pull/961#issuecomment-45125185
>
>
> On Tue, Jun 17, 2014 at 8:19 PM, Hatch M wrote:
> > Trying to aggregate over a sliding window, playing with the
There is a bug:
https://github.com/apache/spark/pull/961#issuecomment-45125185
On Tue, Jun 17, 2014 at 8:19 PM, Hatch M wrote:
> Trying to aggregate over a sliding window, playing with the slide duration.
> Playing around with the slide interval I can see the aggregation works but
> mostly fail