Re: Map with state for RDDs

2020-05-24 Thread Antonin Delpeuch (lists)
On 24/05/2020 11:27, Antonin Delpeuch (lists) wrote: > With this formulation, zipWithIndex would be a special case of > mapWithState (so it could be refactored to be expressed as such). Forget about this part, it would obviously not, since zipWithIndex can compute the size of each partition indepe

Re: Map with state for RDDs

2020-05-24 Thread Antonin Delpeuch (lists)
The API signature would of course be more general (sorry!): Given a RDD of elements of type T, an initial state of type S and a map function (S,T) -> (S,U), return an RDD of Us obtained by applying the map function in sequence, updating the state as elements are mapped. With this formulation, zip

Map with state for RDDs

2020-05-24 Thread Antonin Delpeuch (lists)
Hi, Spark Streaming has a `mapWithState` API to run a map on a stream while maintaining a state as elements are read. The core RDD API does not seem to have anything similar. Given a RDD of elements of type T, an initial state of type S and a map function (S,T) -> (S,T), return an RDD of Ts obtai