I have a cluster with 3 nodes (each with 8 cores) using Spark 1.0.1. I have an RDD<String> which I've repartitioned so it has 100 partitions (hoping to increase the parallelism).
When I do a transformation (such as filter) on this RDD, I can't seem to get more than 24 tasks (my total number of cores across the 3 nodes) going at one point in time. By tasks, I mean the number of tasks that appear under the Application UI. I tried explicitly setting the spark.default.parallelism to 48 (hoping I would get 48 tasks concurrently running) and verified this in the Application UI for the running application but this had no effect. Perhaps, this is ignored for a 'filter' and the default is the total number of cores available. I'm fairly new with Spark so maybe I'm just missing or misunderstanding something fundamental. Any help would be appreciated. Thanks. Darin.