This is correct behavior. Each "core" can execute exactly one task at a
time, with each task corresponding to a partition. If your cluster only has
24 cores, you can only run at most 24 tasks at once.

You could run multiple workers per node to get more executors. That would
give you more cores in the cluster. But however many cores you have, each
core will run only one task at a time.


On Wed, Jul 30, 2014 at 3:56 PM, Darin McBeath <ddmcbe...@yahoo.com> wrote:

> I have a cluster with 3 nodes (each with 8 cores) using Spark 1.0.1.
>
> I have an RDD<String> which I've repartitioned so it has 100 partitions
> (hoping to increase the parallelism).
>
> When I do a transformation (such as filter) on this RDD, I can't  seem to
> get more than 24 tasks (my total number of cores across the 3 nodes) going
> at one point in time.  By tasks, I mean the number of tasks that appear
> under the Application UI.  I tried explicitly setting the
> spark.default.parallelism to 48 (hoping I would get 48 tasks concurrently
> running) and verified this in the Application UI for the running
> application but this had no effect.  Perhaps, this is ignored for a
> 'filter' and the default is the total number of cores available.
>
> I'm fairly new with Spark so maybe I'm just missing or misunderstanding
> something fundamental.  Any help would be appreciated.
>
> Thanks.
>
> Darin.
>
>


-- 
Daniel Siegmann, Software Developer
Velos
Accelerating Machine Learning

440 NINTH AVENUE, 11TH FLOOR, NEW YORK, NY 10001
E: daniel.siegm...@velos.io W: www.velos.io

Reply via email to