`zipWithIndex` is both compute intensive and breaks Spark's
"transformations are lazy" model, so it is probably not appropriate to add
this to the public RDD API. If `zipWithIndex` weren't already what I
consider to be broken, I'd be much friendlier to building something more on
top of it, but I r
Hi all – a utility that I’ve found useful several times now when working with
RDDs is to be able to reason about segments of the RDD.
For example, if I have two large RDDs and I want to combine them in a way that
would be intractable in terms of memory or disk storage (e.g. A cartesian) but
a p