[ 
https://issues.apache.org/jira/browse/HIVE-8858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chinna Rao Lalam updated HIVE-8858:
-----------------------------------
    Attachment: HIVE-8858-spark.patch

Hi [~xuefuz],

I have uploaded the draft patch. Here i have added a simple logic by rever 
tracking of reducTrans. Here is the output.

{quote}
FROM (select 'tst1' as key, cast(count(1) as string) as value from src s1
                         UNION  ALL  
      select s2.key as key, s2.value as value from src s2) unionsrc
INSERT OVERWRITE TABLE DEST1 SELECT unionsrc.key, COUNT(DISTINCT 
SUBSTR(unionsrc.value,5)) GROUP BY unionsrc.key
INSERT OVERWRITE TABLE DEST2 SELECT unionsrc.key, unionsrc.value, 
COUNT(DISTINCT SUBSTR(unionsrc.value,5)) GROUP BY unionsrc.key, unionsrc.value
{quote}

spark.SparkPlan (SparkPlan.java:logSparkPlan(95)) - 
------------------------------ Spark Plan -----------------------------
spark.SparkPlan (SparkPlan.java:logSparkPlan(101)) -    Reduce <-- ( Shuffle 
(cache off)  )  <-- ( MapTran,Reduce )  <-- ( MapInput (cache off)  ) 
spark.SparkPlan (SparkPlan.java:logSparkPlan(101)) -    Reduce <-- ( Shuffle 
(cache on)  )  <-- ( MapTran )  <-- ( MapInput (cache off)  ) 
spark.SparkPlan (SparkPlan.java:logSparkPlan(101)) -    Reduce <-- ( Shuffle 
(cache off)  )  <-- ( MapTran,Reduce )  <-- ( MapInput (cache off)  ) 
spark.SparkPlan (SparkPlan.java:logSparkPlan(101)) -    Reduce <-- ( Shuffle 
(cache on)  )  <-- ( MapTran )  <-- ( MapInput (cache off)  ) 
spark.SparkPlan (SparkPlan.java:logSparkPlan(104)) - 
------------------------------ Spark Plan -----------------------------

{quote}
select * from   
(      
  select a.key, a.val as val1, b.val as val2 from T1 a join T2 b on a.key = 
b.key
    union all   
  select a.key, a.val as val1, b.val as val2 from T1 a join T2 b on a.key = 
b.key
) subq1
ORDER BY key, val1, val2;
{quote}

spark.SparkPlan (SparkPlan.java:logSparkPlan(95)) - 
------------------------------ Spark Plan -----------------------------
spark.SparkPlan (SparkPlan.java:logSparkPlan(101)) -    Reduce <-- ( Shuffle 
(cache off)  )  <-- ( Reduce,Reduce,Reduce,Reduce ) 
spark.SparkPlan (SparkPlan.java:logSparkPlan(101)) -    Reduce <-- ( Shuffle 
(cache off)  )  <-- ( MapTran,MapTran )  <-- ( MapInput (cache off)  ) 
spark.SparkPlan (SparkPlan.java:logSparkPlan(101)) -    Reduce <-- ( Shuffle 
(cache off)  )  <-- ( MapTran,MapTran )  <-- ( MapInput (cache off)  ) 
spark.SparkPlan (SparkPlan.java:logSparkPlan(101)) -    Reduce <-- ( Shuffle 
(cache off)  )  <-- ( MapTran,MapTran )  <-- ( MapInput (cache off)  ) 
spark.SparkPlan (SparkPlan.java:logSparkPlan(101)) -    Reduce <-- ( Shuffle 
(cache off)  )  <-- ( MapTran,MapTran )  <-- ( MapInput (cache off)  ) 
spark.SparkPlan (SparkPlan.java:logSparkPlan(104)) - 
------------------------------ Spark Plan -----------------------------

> Visualize generated Spark plan [Spark Branch]
> ---------------------------------------------
>
>                 Key: HIVE-8858
>                 URL: https://issues.apache.org/jira/browse/HIVE-8858
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Spark
>            Reporter: Xuefu Zhang
>            Assignee: Chinna Rao Lalam
>         Attachments: HIVE-8858-spark.patch
>
>
> The spark plan generated by SparkPlanGenerator contains info which isn't 
> available in Hive's explain plan, such as RDD caching. Also, the graph is 
> slight different from orignal SparkWork. Thus, it would be nice to visualize 
> the plan as is done for SparkWork.
> Preferrably, the visualization can happen as part of Hive explain extended. 
> If not feasible, we at least can log this at info level.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to