Re: Isolate 1 partition and perform computations

2018-04-14 Thread Thodoris Zois
I forgot to mention that I would like my approach to be independent from the application that user is going to submit to Spark. Assume that I don’t know anything about user’s application… I expected to find a simpler approach. I saw in RDD.scala that an RDD is characterized by a list of partit

Re: Isolate 1 partition and perform computations

2018-04-14 Thread Matthias Boehm
you might wanna have a look into using a PartitionPruningRDD to select a subset of partitions by ID. This approach worked very well for multi-key lookups for us [1]. A major advantage compared to scan-based operations is that, if your source RDD has an existing partitioner, only relevant partition

Isolate 1 partition and perform computations

2018-04-14 Thread Thodoris Zois
Hello list, I am sorry for sending this message here, but I could not manage to get any response in “users”. For specific purposes I would like to isolate 1 partition of the RDD and perform computations only to this. For instance, suppose that a user asks Spark to create 500 partitions for the