Is there a way to visualize the task dependency graph of an application, during or after its execution? The list of stages on port 4040 is useful, but still quite limited. For example, I've found that if I don't cache() the result of one expensive computation, it will get repeated 4 times, but it is not easy to trace through exactly why. Ideally, what I would like for each stage is: - the individual tasks and their dependencies - the various RDD operators that have been applied - the full stack trace, both for the stage barrier, the task, and for the lambdas used (often the RDDs are manipulated inside layers of code, so the immediate file/line# is not enough)
Any suggestions? Thanks, Ravi -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Visualizing-stage-task-dependency-graph-tp11404.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org