Hi, My spark cluster contains machines like Pentium-4, dual core and quad-core machines. I am trying to run a character frequency count application. The application contains several threads, each submitting a job(action) that counts the frequency of a single character. But, my problem is, I get different execution times each time I run the same application with same data (1G text data). Sometimes the difference is as huge as 10-15 mins. I think, this pertains to scheduling when the cluster is heterogeneous in nature. Can someone please tell me how tackle this issue?. I need to get consistent results. Any suggestions please!!
I cache() the rdd. Total 7 slave nodes. Executor memory=2500m. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Inconsistent-execution-times-for-same-application-tp21662.html Sent from the Apache Spark User List mailing list archive at Nabble.com.