When you click on a stage in the Spark UI at 4040, you can see how many
tasks are running concurrently.
How many tasks should I expect to see running concurrently, given I have
things set up optimally in my cluster, and my RDDs are partitioned properly?
Is it the total number of virtual cores across all my slaves?
I devised the following script to give me that number for a cluster created
by spark-ec2.
# spark-ec2 cluster
# run on driver node
# total number of virtual cores across all slaves
yum install -y pssh
{ nproc; pssh -i -h /root/spark-ec2/slaves nproc; } | grep -v "SUCCESS" |
paste -sd+ | bc
Nick
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/What-level-of-parallelism-should-I-expect-from-my-cluster-tp3999.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.