The original DStream is of (K,V). This function creates a DStream of (K,S). Each time slice brings one or more new V for each K. The old state S (can be different from V!) for each K -- possibly non-existent -- is updated in some way by a bunch of new V, to produce a new state S -- which also might not exist anymore after update. That's why the function is from a Seq[V], and an Option[S], to an Option[S].
If you RDD has value type V = Double then your function needs to update state based on a new Seq[Double] at each time slice, since Doubles are the new thing arriving for each key at each time slice. On Tue, Apr 29, 2014 at 7:50 PM, Adrian Mocanu <amoc...@verticalscope.com> wrote: > What is Seq[V] in updateStateByKey? > > Does this store the collected tuples of the RDD in a collection? > > > > Method signature: > > def updateStateByKey[S: ClassTag] ( updateFunc: (Seq[V], Option[S]) => > Option[S] ): DStream[(K, S)] > > > > In my case I used Seq[Double] assuming a sequence of Doubles in the RDD; the > moment I switched to a different type like Seq[(String, Double)] the code > didn’t compile. > > > > -Adrian > >