If you'd like to re-use the resulting inverted map, you can persist the result:
x = myRdd.mapPartitions(create inverted map).persist()
Your function would create the reverse map and then return an iterator
over the keys in that map.
On Wed, Sep 17, 2014 at 1:04 PM, Akshat Aranya wrote:
> Patri
Patrick,
If I understand this correctly, I won't be able to do this in the closure
provided to mapPartitions() because that's going to be stateless, in the
sense that a hash map that I create within the closure would only be useful
for one call of MapPartitionsRDD.compute(). I guess I would need
If each partition can fit in memory, you can do this using
mapPartitions and then building an inverse mapping within each
partition. You'd need to construct a hash map within each partition
yourself.
On Tue, Sep 16, 2014 at 4:27 PM, Akshat Aranya wrote:
> I have a use case where my RDD is set up