Hi everyone, for research purposes I wanted to see how Spark scales for my algorithm with regards to different cluster sizes. I have a cluster with 10 nodes with 6 cores and 45 GB of RAM each. My algorithm takes approximately 15 minutes to execute on all nodes (as seen in Spark UI, each node was running). Here's the weird thing: I gradually reduced the number of slave nodes down to 1 and the execution time for my application stayed exactly the same. I expected the execution time to go up linearly/exponentially as the number of slave nodes goes down, but it doesn't. Now I have no idea how to debug this "issue" or what to look for in the UI.
Things I have tried: - Reduce the amount of RAM on each machine down to 1GB. - Reduce the number of cores on each machine down to 1. - Increase the amount of data I am processing. I see that when I reduce resources on each machine (i.e. reduce RAM/CPU), the computation time goes up, but the time taken still remains almost constant with different numbers of slaves. Can someone give me a hint on what to look for? This behaviour seems very strange to me. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Constant-Spark-execution-time-with-different-of-slaves-tp24735.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
