I agree that this is definitely useful. One related project I know of is Sparkling [1] (also see talk at Spark Summit 2014 [2]), but it'd be great (and I imagine somewhat challenging) to visualize the *physical execution* graph of a Spark job.
[1] http://pr01.uml.edu/ [2] http://spark-summit.org/2014/talk/sparkling-identification-of-task-skew-and-speculative-partition-of-data-for-spark-applications On Mon, Aug 4, 2014 at 8:55 PM, rpandya <r...@iecommerce.com> wrote: > Is there a way to visualize the task dependency graph of an application, > during or after its execution? The list of stages on port 4040 is useful, > but still quite limited. For example, I've found that if I don't cache() the > result of one expensive computation, it will get repeated 4 times, but it is > not easy to trace through exactly why. Ideally, what I would like for each > stage is: > - the individual tasks and their dependencies > - the various RDD operators that have been applied > - the full stack trace, both for the stage barrier, the task, and for the > lambdas used (often the RDDs are manipulated inside layers of code, so the > immediate file/line# is not enough) > > Any suggestions? > > Thanks, > > Ravi > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Visualizing-stage-task-dependency-graph-tp11404.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org