Hello,
While looking through spark physical plans generated by the spark history
server log to find any bottle necks in my code, I stumbled across an ID that
shows up in a partitioning stage.
My goal is to use the history server log to provide meaningful analysis on my
spark system performance. With this goal in mind, I am trying to connect spark
physical plans to StageIDs which house useful information that I can tie back
to my code. Below is a snippet from one of the physical plans.
+- *(2) Sort [Column#46 ASC NULLS FIRST], true, 0
+- Exchange hashpartitioning(ColumnId#329, 200), ENSURE_REQUIREMENTS,
[id=#278]
What exactly does [id=#278] refer to?
I have seen some examples that say this ID is a reference to a specific
partition, a stage id, or a plan_id but I have not been able to confirm which
one it is.
Thank you,
Tahj