I forgot to mention that I would like my approach to be independent from the
application that user is going to submit to Spark.
Assume that I don’t know anything about user’s application… I expected to find
a simpler approach. I saw in RDD.scala that an RDD is characterized by a list
of partit
you might wanna have a look into using a PartitionPruningRDD to select
a subset of partitions by ID. This approach worked very well for
multi-key lookups for us [1].
A major advantage compared to scan-based operations is that, if your
source RDD has an existing partitioner, only relevant partition
Hello list,
I am sorry for sending this message here, but I could not manage to get any
response in “users”. For specific purposes I would like to isolate 1 partition
of the RDD and perform computations only to this.
For instance, suppose that a user asks Spark to create 500 partitions for the