----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/60632/ -----------------------------------------------------------
(Updated July 5, 2017, 4:07 a.m.) Review request for hive. Changes ------- Update GenSparkUtils.java based on Rui's comments Repository: hive-git Description ------- HIVE-16659: Query plan should reflect hive.spark.use.groupby.shuffle Diffs (updated) ----- itests/src/test/resources/testconfiguration.properties 19ff316 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RepartitionShuffler.java d0c708c ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 5f85f9e ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java b9901da ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkWork.java afbeccb ql/src/test/queries/clientpositive/spark_explain_groupbyshuffle.q PRE-CREATION ql/src/test/results/clientpositive/spark/spark_explain_groupbyshuffle.q.out PRE-CREATION Diff: https://reviews.apache.org/r/60632/diff/2/ Changes: https://reviews.apache.org/r/60632/diff/1-2/ Testing ------- set hive.spark.use.groupby.shuffle=true; explain select key, count(val) from t1 group by key; STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Spark Edges: Reducer 2 <- Map 1 (GROUP, 2) DagName: root_20170625202742_58335619-7107-4026-9911-43d2ec449088:2 Vertices: Map 1 Map Operator Tree: TableScan alias: t1 Statistics: Num rows: 20 Data size: 140 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: key (type: int), val (type: string) outputColumnNames: key, val Statistics: Num rows: 20 Data size: 140 Basic stats: COMPLETE Column stats: NONE Group By Operator aggregations: count(val) keys: key (type: int) mode: hash outputColumnNames: _col0, _col1 Statistics: Num rows: 20 Data size: 140 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: _col0 (type: int) sort order: + Map-reduce partition columns: _col0 (type: int) Statistics: Num rows: 20 Data size: 140 Basic stats: COMPLETE Column stats: NONE value expressions: _col1 (type: bigint) Reducer 2 Reduce Operator Tree: Group By Operator aggregations: count(VALUE._col0) keys: KEY._col0 (type: int) mode: mergepartial outputColumnNames: _col0, _col1 Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe Stage: Stage-0 Fetch Operator limit: -1 Processor Tree: ListSink set hive.spark.use.groupby.shuffle=false; explain select key, count(val) from t1 group by key; STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Spark Edges: Reducer 2 <- Map 1 (GROUP, 2) DagName: root_20170625203122_3afe01dd-41cc-477e-9098-ddd58b37ad4e:3 Vertices: Map 1 Map Operator Tree: TableScan alias: t1 Statistics: Num rows: 20 Data size: 140 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: key (type: int), val (type: string) outputColumnNames: key, val Statistics: Num rows: 20 Data size: 140 Basic stats: COMPLETE Column stats: NONE Group By Operator aggregations: count(val) keys: key (type: int) mode: hash outputColumnNames: _col0, _col1 Statistics: Num rows: 20 Data size: 140 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: _col0 (type: int) sort order: + Map-reduce partition columns: _col0 (type: int) Statistics: Num rows: 20 Data size: 140 Basic stats: COMPLETE Column stats: NONE value expressions: _col1 (type: bigint) Reducer 2 Reduce Operator Tree: Group By Operator aggregations: count(VALUE._col0) keys: KEY._col0 (type: int) mode: mergepartial outputColumnNames: _col0, _col1 Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE Column stats: NONE table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe Stage: Stage-0 Fetch Operator limit: -1 Processor Tree: ListSink Thanks, Bing Li