[ 
https://issues.apache.org/jira/browse/HIVE-18368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16319245#comment-16319245
 ] 

Xuefu Zhang commented on HIVE-18368:
------------------------------------

Hi [~stakiar], thanks for working on this. I think this is very useful. I 
haven't looked at the patch, but I have a couple of high-level questions:

1. Can we get rid of code reference such as {{at 
repartitionAndSortWithinPartitions at SortByShuffler.java:57}}. they don't seem 
useful.
2. Can you clarify what's the format of an RDD specification as shown in each 
line of the output. Besides the code reference, I'm not entirely sure what 
other elements means. For instance, I see many "[]" out there.
3. We have several internal object graphs, from Work graph, to SparkTran, and 
to RDD. We can Skip SparkTran entirely, but need to have a clear mapping from 
Work to RDD. Maybe reading the patch will give me the idea.


> Improve Spark Debug RDD Graph
> -----------------------------
>
>                 Key: HIVE-18368
>                 URL: https://issues.apache.org/jira/browse/HIVE-18368
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Spark
>            Reporter: Sahil Takiar
>            Assignee: Sahil Takiar
>         Attachments: HIVE-18368.1.patch, HIVE-18368.2.patch, Spark UI - Named 
> RDDs.png
>
>
> The {{SparkPlan}} class does some logging to show the mapping between 
> different {{SparkTran}}, what shuffle types are used, and what trans are 
> cached. However, there is room for improvement.
> When debug logging is enabled the RDD graph is logged, but there isn't much 
> information printed about each RDD.
> We should combine both of the graphs and improve them. We could even make the 
> Spark Plan graph part of the {{EXPLAIN EXTENDED}} output.
> Ideally, the final graph shows a clear relationship between Tran objects, 
> RDDs, and BaseWorks. Edge should include information about number of 
> partitions, shuffle types, Spark operations used, etc.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to