Yes, it will introduce a shuffle stage in order to perform the repartitioning. So it’s more useful if you’re planning to do many downstream transformations for which you need the increased parallelism.
Is this a dataset from HDFS? From: "ÐΞ€ρ@Ҝ (๏̯͡๏)" Date: Wednesday, June 24, 2015 at 6:11 PM To: Silvio Fiorito Cc: user Subject: Re: how to increase parallelism ? What that did was run a repartition with 174 tasks repartition with 174 tasks AND actual .filter.map stage with 500 tasks It actually doubled to stages. On Wed, Jun 24, 2015 at 12:01 PM, Silvio Fiorito <silvio.fior...@granturing.com<mailto:silvio.fior...@granturing.com>> wrote: Hi Deepak, Parallelism is controlled by the number of partitions. In this case, how many partitions are there for the details RDD (likely 170). You can check by running “details.partitions.length”. If you want to increase parallelism you can do so by repartitioning, increasing the number of partitions: “details.repartition(xxxx)” Thanks, Silvio From: "ÐΞ€ρ@Ҝ (๏̯͡๏)" Date: Wednesday, June 24, 2015 at 1:57 PM To: user Subject: how to increase parallelism ? I have a filter.map that triggers 170 tasks. How can i increase it ? Code: val viEvents = details.filter(_.get(14).asInstanceOf[Long] != NULL_VALUE).map { vi => (vi.get(14).asInstanceOf[Long], vi) } Deepak -- Deepak