[ 
https://issues.apache.org/jira/browse/HIVE-8858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14387344#comment-14387344
 ] 

Xuefu Zhang commented on HIVE-8858:
-----------------------------------

Hi Chinna, thanks for working on this. I haven't checked your patch, but the 
output looks nice. I have a few suggestions:

1. we need numbering in the Trans. Otherwise, it's hard to visualize the graph.
2. Other information, such as num of partitions in ShuffleTran, is also 
important to show.
3. It would be better if we log this graph in one line. The easiest way is to 
have a toString() method in SparkPlan and then we can just log the string 
representation of SparkPlan.
4. To avoid long lines, we can show the graph in the same way as we show work 
graph. For instance
{code}
MapTran 1 <- MapInput 1 (cache off)
Shuffle1 (cache on) <- MapTran 1
Reduce 1 <- Shuffle1 (cache on)
Reduce 2 <- Shuffle1 (cache on)
{code}
Please note that this may not represent a valid plan.

[~jxiang]/[~csun], please feel free to share your thoughts.

> Visualize generated Spark plan [Spark Branch]
> ---------------------------------------------
>
>                 Key: HIVE-8858
>                 URL: https://issues.apache.org/jira/browse/HIVE-8858
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Spark
>            Reporter: Xuefu Zhang
>            Assignee: Chinna Rao Lalam
>         Attachments: HIVE-8858-spark.patch
>
>
> The spark plan generated by SparkPlanGenerator contains info which isn't 
> available in Hive's explain plan, such as RDD caching. Also, the graph is 
> slight different from orignal SparkWork. Thus, it would be nice to visualize 
> the plan as is done for SparkWork.
> Preferrably, the visualization can happen as part of Hive explain extended. 
> If not feasible, we at least can log this at info level.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to