Hi Sean, Thanks for the reply. I believe I am not facing the scenarios you mentioned.
Timestamp conflict: I see the Spark driver logs on the console (tried with INFO and DEBUG). In all the scenarios, I see the result getting printed and the application execution continues for 4 more minutes. ie: I have seen scenarios where Spark History Server time stamp not matching with the Spark driver logs and all. In this case, I am checking only the driver logs and I could see the logs getting printed on the console even after the result is generated. Stages of a different action: I am performing a join on 2 tables and doing a count operation. So there is only one action. The stage which is taking more time is the join phase (Sort merge join specifically). To improve the join, I tried to cache the smaller dataset. Then I do not see the issue. I am just wondering how Spark can get the result before the completion of the join operation. PS: My actual query in the application has many operators, UDF's etc. The above is the minimal operation query for which I am able to reproduce the issue. -- Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org