Hello everyone, I am trying out Spark for the first time, and after a month of work - I am stuck with an issue. I have a very simple program that, given a directed Graph with nodes/edges parameters and a particular node, tries to figure out all the siblings(in the traditional sense) of the given node.
Right now, I have 1200 partitions, and I see that while most of the tasks(on an average 1190-1195) tasks finish within 500 ms, a few tasks (5-10 of them) take about 1-2 seconds to finish. I am aiming for a scenario wherein all the tasks finish under a second, and hence trying to figure out why a few tasks(5-10 of them) take longer time to complete as opposed to the remaining (1190-1195) tasks ? Also, Please let me know whether its possible to change some settings, so as to achieve my target scenario ? Any help would be much appreciated. My configurations: 1. Tried with both FAIR/FIFO scheduler. 2. Tried playing around with Spark.locality.wait settings. Currently I have a max scheduler delay of 300 ms. 3. Version: Apache Spark 1.0.0 on a 50 node cluster, 14GB each RAM, 8 cores/node. Thanks, Sarthak -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Maximum-jobs-finish-very-soon-some-of-them-take-longer-time-tp10750.html Sent from the Apache Spark User List mailing list archive at Nabble.com.