I agree that this is definitely useful.

One related project I know of is Sparkling [1] (also see talk at Spark
Summit 2014 [2]), but it'd be great (and I imagine somewhat
challenging) to visualize the *physical execution* graph of a Spark
job.

[1] http://pr01.uml.edu/
[2] 
http://spark-summit.org/2014/talk/sparkling-identification-of-task-skew-and-speculative-partition-of-data-for-spark-applications

On Mon, Aug 4, 2014 at 8:55 PM, rpandya <r...@iecommerce.com> wrote:
> Is there a way to visualize the task dependency graph of an application,
> during or after its execution? The list of stages on port 4040 is useful,
> but still quite limited. For example, I've found that if I don't cache() the
> result of one expensive computation, it will get repeated 4 times, but it is
> not easy to trace through exactly why. Ideally, what I would like for each
> stage is:
> - the individual tasks and their dependencies
> - the various RDD operators that have been applied
> - the full stack trace, both for the stage barrier, the task, and for the
> lambdas used (often the RDDs are manipulated inside layers of code, so the
> immediate file/line# is not enough)
>
> Any suggestions?
>
> Thanks,
>
> Ravi
>
>
>
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/Visualizing-stage-task-dependency-graph-tp11404.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to