It's great that spark scheduler does optimized DAG processing and only does
lazy eval when some action is performed or shuffle dependency is
encountered. Sometime it goes further after shuffle dep before executing
anything. e.g. if there are map steps after shuffle then it doesn't stop at
shuffle to execute anything but goes to that next map steps until it finds
a reason(spark action) to execute. As a result stage that spark is running
can be internally series of (map -> shuffle -> map -> map -> collect) and
spark UI just shows its currently running 'collect' stage. SO  if job fails
at that point spark UI just says Collect failed but in fact it could be any
stage in that lazy chain of evaluation. Looking at executor logs gives some
insights but that's not always straightforward.
Correct me if I am wrong here but I think we need more visibility into
what's happening underneath so we can easily troubleshoot as well as
optimize our DAG.

THanks

-- 


[image: What's New with Xactly] <http://www.xactlycorp.com/email-click/>

<https://www.nyse.com/quote/XNYS:XTLY>  [image: LinkedIn] 
<https://www.linkedin.com/company/xactly-corporation>  [image: Twitter] 
<https://twitter.com/Xactly>  [image: Facebook] 
<https://www.facebook.com/XactlyCorp>  [image: YouTube] 
<http://www.youtube.com/xactlycorporation>

Reply via email to