On 24/05/2020 11:27, Antonin Delpeuch (lists) wrote:
> With this formulation, zipWithIndex would be a special case of
> mapWithState (so it could be refactored to be expressed as such).
Forget about this part, it would obviously not, since zipWithIndex can
compute the size of each partition indepe
The API signature would of course be more general (sorry!):
Given a RDD of elements of type T, an initial state of type S and a map
function (S,T) -> (S,U), return an RDD of Us obtained by applying the
map function in sequence, updating the state as elements are mapped.
With this formulation, zip
Hi,
Spark Streaming has a `mapWithState` API to run a map on a stream while
maintaining a state as elements are read.
The core RDD API does not seem to have anything similar. Given a RDD of
elements of type T, an initial state of type S and a map function (S,T)
-> (S,T), return an RDD of Ts obtai