----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/60632/#review179595 -----------------------------------------------------------
Ship it! Ship It! - Rui Li On July 5, 2017, 4:07 a.m., Bing Li wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/60632/ > ----------------------------------------------------------- > > (Updated July 5, 2017, 4:07 a.m.) > > > Review request for hive. > > > Repository: hive-git > > > Description > ------- > > HIVE-16659: Query plan should reflect hive.spark.use.groupby.shuffle > > > Diffs > ----- > > itests/src/test/resources/testconfiguration.properties 19ff316 > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RepartitionShuffler.java > d0c708c > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java > 5f85f9e > ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkUtils.java > b9901da > ql/src/java/org/apache/hadoop/hive/ql/parse/spark/GenSparkWork.java afbeccb > ql/src/test/queries/clientpositive/spark_explain_groupbyshuffle.q > PRE-CREATION > ql/src/test/results/clientpositive/spark/spark_explain_groupbyshuffle.q.out > PRE-CREATION > > > Diff: https://reviews.apache.org/r/60632/diff/2/ > > > Testing > ------- > > set hive.spark.use.groupby.shuffle=true; > explain select key, count(val) from t1 group by key; > > STAGE DEPENDENCIES: > Stage-1 is a root stage > Stage-0 depends on stages: Stage-1 > > STAGE PLANS: > Stage: Stage-1 > Spark > Edges: > Reducer 2 <- Map 1 (GROUP, 2) > DagName: root_20170625202742_58335619-7107-4026-9911-43d2ec449088:2 > Vertices: > Map 1 > Map Operator Tree: > TableScan > alias: t1 > Statistics: Num rows: 20 Data size: 140 Basic stats: > COMPLETE Column stats: NONE > Select Operator > expressions: key (type: int), val (type: string) > outputColumnNames: key, val > Statistics: Num rows: 20 Data size: 140 Basic stats: > COMPLETE Column stats: NONE > Group By Operator > aggregations: count(val) > keys: key (type: int) > mode: hash > outputColumnNames: _col0, _col1 > Statistics: Num rows: 20 Data size: 140 Basic stats: > COMPLETE Column stats: NONE > Reduce Output Operator > key expressions: _col0 (type: int) > sort order: + > Map-reduce partition columns: _col0 (type: int) > Statistics: Num rows: 20 Data size: 140 Basic stats: > COMPLETE Column stats: NONE > value expressions: _col1 (type: bigint) > Reducer 2 > Reduce Operator Tree: > Group By Operator > aggregations: count(VALUE._col0) > keys: KEY._col0 (type: int) > mode: mergepartial > outputColumnNames: _col0, _col1 > Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE > Column stats: NONE > File Output Operator > compressed: false > Statistics: Num rows: 10 Data size: 70 Basic stats: > COMPLETE Column stats: NONE > table: > input format: > org.apache.hadoop.mapred.SequenceFileInputFormat > output format: > org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat > serde: > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > > Stage: Stage-0 > Fetch Operator > limit: -1 > Processor Tree: > ListSink > > > set hive.spark.use.groupby.shuffle=false; > explain select key, count(val) from t1 group by key; > > STAGE DEPENDENCIES: > Stage-1 is a root stage > Stage-0 depends on stages: Stage-1 > > STAGE PLANS: > Stage: Stage-1 > Spark > Edges: > Reducer 2 <- Map 1 (GROUP, 2) > DagName: root_20170625203122_3afe01dd-41cc-477e-9098-ddd58b37ad4e:3 > Vertices: > Map 1 > Map Operator Tree: > TableScan > alias: t1 > Statistics: Num rows: 20 Data size: 140 Basic stats: > COMPLETE Column stats: NONE > Select Operator > expressions: key (type: int), val (type: string) > outputColumnNames: key, val > Statistics: Num rows: 20 Data size: 140 Basic stats: > COMPLETE Column stats: NONE > Group By Operator > aggregations: count(val) > keys: key (type: int) > mode: hash > outputColumnNames: _col0, _col1 > Statistics: Num rows: 20 Data size: 140 Basic stats: > COMPLETE Column stats: NONE > Reduce Output Operator > key expressions: _col0 (type: int) > sort order: + > Map-reduce partition columns: _col0 (type: int) > Statistics: Num rows: 20 Data size: 140 Basic stats: > COMPLETE Column stats: NONE > value expressions: _col1 (type: bigint) > Reducer 2 > Reduce Operator Tree: > Group By Operator > aggregations: count(VALUE._col0) > keys: KEY._col0 (type: int) > mode: mergepartial > outputColumnNames: _col0, _col1 > Statistics: Num rows: 10 Data size: 70 Basic stats: COMPLETE > Column stats: NONE > File Output Operator > compressed: false > Statistics: Num rows: 10 Data size: 70 Basic stats: > COMPLETE Column stats: NONE > table: > input format: > org.apache.hadoop.mapred.SequenceFileInputFormat > output format: > org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat > serde: > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > > Stage: Stage-0 > Fetch Operator > limit: -1 > Processor Tree: > ListSink > > > Thanks, > > Bing Li > >