Hi everyone,

for research purposes I wanted to see how Spark scales for my algorithm with
regards to different cluster sizes. I have a cluster with 10 nodes with 6
cores and 45 GB of RAM each. My algorithm takes approximately 15 minutes to
execute on all nodes (as seen in Spark UI, each node was running). Here's
the weird thing: I gradually reduced the number of slave nodes down to 1 and
the execution time for my application stayed exactly the same. I expected
the execution time to go up linearly/exponentially as the number of slave
nodes goes down, but it doesn't. Now I have no idea how to debug this
"issue" or what to look for in the UI.

Things I have tried:
- Reduce the amount of RAM on each machine down to 1GB.
- Reduce the number of cores on each machine down to 1.
- Increase the amount of data I am processing.

I see that when I reduce resources on each machine (i.e. reduce RAM/CPU),
the computation time goes up, but the time taken still remains almost
constant with different numbers of slaves. 

Can someone give me a hint on what to look for? This behaviour seems very
strange to me.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Constant-Spark-execution-time-with-different-of-slaves-tp24735.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to