This would be great.

At least for logical nodes, would it be possible to re-use the existing 
Utils.getCallSite<https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/Utils.scala#L1526>
 to populate a field when nodes are created? I suppose most value would come 
from eventually passing the call-sites along to physical nodes. But maybe just 
as starting point Spark could display the call-site only with unoptimized 
logical plans? Users would still get a better sense for how the plan’s 
structure relates to their code.

From: mhawes <hawes.i...@gmail.com>
Date: Friday, 21 May 2021 at 22:36
To: dev@spark.apache.org <dev@spark.apache.org>
Subject: Re: Bridging gap between Spark UI and Code
CAUTION: This email originates from an external party (outside of Palantir). If 
you believe this message is suspicious in nature, please use the "Report 
Phishing" button built into Outlook.


Reviving this thread to ask whether any of the Spark maintainers would
consider helping to scope a solution for this. Michal outlines the problem
in this thread, but to clarify. The issue is that for very complex spark
application where the Logical Plans often span many pages, it is extremely
hard to figure out how the stages in the Spark UI/RDD operations link to the
Logical Plan that generated them.

Now, obviously this is a hard problem to solve given the various
optimisations and transformations that go on in between these two stages.
However I wanted to raise it as a potential option as I think it would be
/extremely/ valuable for Spark users.

My two main ideas are either:
 - To carry a reference to the original plan around when
planning/optimising.
 - To maintain a separate mapping for each planning/optimisation step that
maps from source to target. Im thinking along the lines of JavaScript
sourcemaps.

It would be great to get the opinion of an experienced Spark maintainer on
this, given the complexity.



--
Sent from: 
https://urldefense.proofpoint.com/v2/url?u=http-3A__apache-2Dspark-2Ddevelopers-2Dlist.1001551.n3.nabble.com_&d=DwICAg&c=izlc9mHr637UR4lpLEZLFFS3Vn2UXBrZ4tFb6oOnmz8&r=HrP36vwrw3UfNOlJ_ndb5EgIQ5INvWvw9xCbXhhQujY&m=jhxzuGxzWWdVR-pHNp2qV4JtVtGoOiAisKfUe-ySPt8&s=S68eCuXKhVzlv12dMdK8YM1YY0BocZ3vMblM_I8E_wo&e=

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Reply via email to