The original DStream is of (K,V). This function creates a DStream of
(K,S). Each time slice brings one or more new V for each K. The old
state S (can be different from V!) for each K -- possibly non-existent
-- is updated in some way by a bunch of new V, to produce a new state
S -- which also might not exist anymore after update. That's why the
function is from a Seq[V], and an Option[S], to an Option[S].

If you RDD has value type V = Double then your function needs to
update state based on a new Seq[Double] at each time slice, since
Doubles are the new thing arriving for each key at each time slice.


On Tue, Apr 29, 2014 at 7:50 PM, Adrian Mocanu
<amoc...@verticalscope.com> wrote:
> What is Seq[V] in updateStateByKey?
>
> Does this store the collected tuples of the RDD in a collection?
>
>
>
> Method signature:
>
> def updateStateByKey[S: ClassTag] ( updateFunc: (Seq[V], Option[S]) =>
> Option[S] ): DStream[(K, S)]
>
>
>
> In my case I used Seq[Double] assuming a sequence of Doubles in the RDD; the
> moment I switched to a different type like Seq[(String, Double)] the code
> didn’t compile.
>
>
>
> -Adrian
>
>

Reply via email to