Hi Sean,

Thanks for the reply. I believe I am not facing the scenarios you mentioned. 

Timestamp conflict: I see the Spark driver logs on the console (tried with
INFO and DEBUG). In all the scenarios, I see the result getting printed and
the application execution continues for 4 more minutes. 
ie: I have seen scenarios where Spark History Server time stamp not matching
with the Spark driver logs and all. In this case, I am checking only the
driver logs and I could see the logs getting printed on the console even
after the result is generated. 

Stages of a different action: I am performing a join on 2 tables and doing a
count operation. So there is only one action. The stage which is taking more
time is the join phase (Sort merge join specifically). To improve the join,
I tried to cache the smaller dataset. Then I do not see the issue. 

I am just wondering how Spark can get the result before the completion of
the join operation.

PS: My actual query in the application has many operators, UDF's etc. The
above is the minimal operation query for which I am able to reproduce the
issue. 






--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Reply via email to