Hey Rishitesh, Thats perfect thanks so much! Dont know why i didnt think of using mapPartitions like this
Thanks, Jem On Fri, Aug 28, 2015 at 10:35 AM Rishitesh Mishra <rishi80.mis...@gmail.com> wrote: > Hi Jem, > A simple way to get this is to use MapPartitionedRDD. Please see the below > code. For this you need to know your parent RDD's partition numbers that > you want to exclude. One drawback here is the new RDD will also invoke > similar number of tasks as parent RDDs as both the RDDs have same number of > partitions. We only be excluding the results from certain partitions. If > you can live with that , then its OK. > > val ones = sc.makeRDD(1 to 100, 10).map(x => x) // base RDD > > // Reduced RDD > val reduced = ones.mapPartitions { iter => { > > new Iterator[Int](){ > override def hasNext: Boolean = { > if(Seq(0,1,2).contains(TaskContext.get().partitionId)) { > false > } else{ > iter.hasNext > } > } > > override def next():Int = iter.next() > } > > } > }.collect().foreach(println) > > > > > On Fri, Aug 28, 2015 at 12:33 PM, Jem Tucker <jem.tuc...@gmail.com> wrote: > >> Hi, >> >> I am trying to create an RDD from a selected number of its parents >> partitions. My current approach is to create my own SelectedPartitionRDD >> and implement compute and numPartitions myself, problem is the compute >> method is marked as @developerApi, and hence unsuitable for me to be using >> in my application. Are there any alternative methods that will only use the >> stable parts of the spark API? >> >> Thanks, >> >> Jem >> > >