Yes, it will introduce a shuffle stage in order to perform the repartitioning. 
So it’s more useful if you’re planning to do many downstream transformations 
for which you need the increased parallelism.

Is this a dataset from HDFS?

From: "ÐΞ€ρ@Ҝ (๏̯͡๏)"
Date: Wednesday, June 24, 2015 at 6:11 PM
To: Silvio Fiorito
Cc: user
Subject: Re: how to increase parallelism ?

What that did was run a repartition with 174 tasks

repartition with 174 tasks
AND
actual .filter.map stage with 500 tasks

It actually doubled to stages.



On Wed, Jun 24, 2015 at 12:01 PM, Silvio Fiorito 
<silvio.fior...@granturing.com<mailto:silvio.fior...@granturing.com>> wrote:
Hi Deepak,

Parallelism is controlled by the number of partitions. In this case, how many 
partitions are there for the details RDD (likely 170).

You can check by running “details.partitions.length”. If you want to increase 
parallelism you can do so by repartitioning, increasing the number of 
partitions: “details.repartition(xxxx)”

Thanks,
Silvio

From: "ÐΞ€ρ@Ҝ (๏̯͡๏)"
Date: Wednesday, June 24, 2015 at 1:57 PM
To: user
Subject: how to increase parallelism ?

I have a filter.map that triggers 170 tasks.  How can i increase it ?

Code:

val viEvents = details.filter(_.get(14).asInstanceOf[Long] != NULL_VALUE).map { 
vi => (vi.get(14).asInstanceOf[Long], vi) }


Deepak




--
Deepak

Reply via email to