[ https://issues.apache.org/jira/browse/HIVE-8436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16333870#comment-16333870 ]
liyunzhang commented on HIVE-8436: ---------------------------------- [~csun]: can you spend some time to explain why need add [MapInput::CopyFunction|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/spark/MapInput.java#L72]? the input is Tuple2<WritableComparable, Writable>, the output is Tuple2<WritableComparable, Writable>, why need to add HadoopRDD->CopyFunction? {code:java} private static class CopyFunction implements PairFunction<Tuple2<WritableComparable, Writable>, WritableComparable, Writable> { private transient Configuration conf; @Override public Tuple2<WritableComparable, Writable> call(Tuple2<WritableComparable, Writable> tuple) throws Exception { if (conf == null) { conf = new Configuration(); } return new Tuple2<WritableComparable, Writable>(tuple._1(), WritableUtils.clone(tuple._2(), conf)); } } {code} > Modify SparkWork to split works with multiple child works [Spark Branch] > ------------------------------------------------------------------------ > > Key: HIVE-8436 > URL: https://issues.apache.org/jira/browse/HIVE-8436 > Project: Hive > Issue Type: Sub-task > Components: Spark > Reporter: Xuefu Zhang > Assignee: Chao Sun > Priority: Major > Fix For: 1.1.0 > > Attachments: HIVE-8436.1-spark.patch, HIVE-8436.10-spark.patch, > HIVE-8436.11-spark.patch, HIVE-8436.2-spark.patch, HIVE-8436.3-spark.patch, > HIVE-8436.4-spark.patch, HIVE-8436.5-spark.patch, HIVE-8436.6-spark.patch, > HIVE-8436.7-spark.patch, HIVE-8436.8-spark.patch, HIVE-8436.9-spark.patch > > > Based on the design doc, we need to split the operator tree of a work in > SparkWork if the work is connected to multiple child works. The way splitting > the operator tree is performed by cloning the original work and removing > unwanted branches in the operator tree. Please refer to the design doc for > details. > This process should be done right before we generate SparkPlan. We should > have a utility method that takes the orignal SparkWork and return a modified > SparkWork. > This process should also keep the information about the original work and its > clones. Such information will be needed during SparkPlan generation > (HIVE-8437). -- This message was sent by Atlassian JIRA (v7.6.3#76005)