Is there a way to visualize the task dependency graph of an application,
during or after its execution? The list of stages on port 4040 is useful,
but still quite limited. For example, I've found that if I don't cache() the
result of one expensive computation, it will get repeated 4 times, but it is
not easy to trace through exactly why. Ideally, what I would like for each
stage is:
- the individual tasks and their dependencies
- the various RDD operators that have been applied
- the full stack trace, both for the stage barrier, the task, and for the
lambdas used (often the RDDs are manipulated inside layers of code, so the
immediate file/line# is not enough)

Any suggestions?

Thanks,

Ravi



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Visualizing-stage-task-dependency-graph-tp11404.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to