I forgot to mention that I would like my approach to be independent from the application that user is going to submit to Spark.
Assume that I don’t know anything about user’s application… I expected to find a simpler approach. I saw in RDD.scala that an RDD is characterized by a list of partitions. If I modify this list and keep only one partition, is it going to work? - Thodoris > On 15 Apr 2018, at 01:40, Matthias Boehm <mboe...@gmail.com> wrote: > > you might wanna have a look into using a PartitionPruningRDD to select > a subset of partitions by ID. This approach worked very well for > multi-key lookups for us [1]. > > A major advantage compared to scan-based operations is that, if your > source RDD has an existing partitioner, only relevant partitions are > accessed. > > [1] > https://github.com/apache/systemml/blob/master/src/main/java/org/apache/sysml/runtime/instructions/spark/MatrixIndexingSPInstruction.java#L603 > > Regards, > Matthias > > On Sat, Apr 14, 2018 at 3:12 PM, Thodoris Zois <z...@ics.forth.gr> wrote: >> Hello list, >> >> I am sorry for sending this message here, but I could not manage to get any >> response in “users”. For specific purposes I would like to isolate 1 >> partition of the RDD and perform computations only to this. >> >> For instance, suppose that a user asks Spark to create 500 partitions for >> the RDD. I would like Spark to create the partitions but perform >> computations only in one partition from those 500 ignoring the other 499. >> >> At first I tried to modify executor in order to run only 1 partition (task) >> but I didn’t manage to make it work. Then I tried the DAG Scheduler but I >> think that I should modify the code in a higher level and let Spark make the >> partitioning but at the end see only one partition and throw throw away all >> the others. >> >> My question is which file should I modify in order to achieve isolating 1 >> partition of the RDD? Where does the actual partitioning is made? >> >> I hope it is clear! >> >> Thank you very much, >> Thodoris >> >> >> --------------------------------------------------------------------- >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >> --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org