Hello,
Thank you very much for your response Anastasie! Today I think I made it
through dropping partitions in (runJob or submitJob) - I don’t remember
exactly, in DAGScheduler.
If it doesn’t work properly after some tests, I will follow your approach.
Thank you,
Thodoris
> On 16 Apr 2018, a
Hi all,
I think this is doable using the mapPartitionsWithIndex method of RDD.
Example:
val partitionIndex = 0 // Your favorite partition index here
val rdd = spark.sparkContext.parallelize(Array.range(0, 1000))
// Replace elements of partitionIndex with [-10, .. ,0]
val fixed = rdd.mapPartit
I forgot to mention that I would like my approach to be independent from the
application that user is going to submit to Spark.
Assume that I don’t know anything about user’s application… I expected to find
a simpler approach. I saw in RDD.scala that an RDD is characterized by a list
of partit
you might wanna have a look into using a PartitionPruningRDD to select
a subset of partitions by ID. This approach worked very well for
multi-key lookups for us [1].
A major advantage compared to scan-based operations is that, if your
source RDD has an existing partitioner, only relevant partition
Hello list,
I am sorry for sending this message here, but I could not manage to get any
response in “users”. For specific purposes I would like to isolate 1 partition
of the RDD and perform computations only to this.
For instance, suppose that a user asks Spark to create 500 partitions for the