Chao Sun created HIVE-16668: ------------------------------- Summary: Hive on Spark generates incorrect plan and result with window function and lateral view Key: HIVE-16668 URL: https://issues.apache.org/jira/browse/HIVE-16668 Project: Hive Issue Type: Bug Components: Spark Reporter: Chao Sun Assignee: Chao Sun
To reproduce: {code} create table t1 (a string); create table t2 (a array<string>); create table dummy (a string); insert into table dummy values ("a"); insert into t1 values ("1"), ("2"); insert overwrite into t2 select array("1", "2", "3", "4") from dummy; set hive.auto.convert.join.noconditionaltask.size=3; explain with tt1 as ( select a as id, count(*) over () as count from t1 ), tt2 as ( select id from t2 lateral view outer explode(a) a_tbl as id ) select tt1.count from tt1 join tt2 on tt1.id = tt2.id; {code} For Hive on Spark, the plan is: {code} STAGE DEPENDENCIES: Stage-2 is a root stage Stage-1 depends on stages: Stage-2 Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-2 Spark Edges: Reducer 2 <- Map 1 (PARTITION-LEVEL SORT, 3), Map 1 (PARTITION-LEVEL SORT, 3) DagName: chao_20170515133259_de9e0583-da24-4399-afc8-b881dfef0469:9 Vertices: Map 1 Map Operator Tree: TableScan alias: t1 Statistics: Num rows: 2 Data size: 2 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: 0 (type: int) sort order: + Map-reduce partition columns: 0 (type: int) Statistics: Num rows: 2 Data size: 2 Basic stats: COMPLETE Column stats: NONE value expressions: a (type: string) Reducer 2 Local Work: Map Reduce Local Work Reduce Operator Tree: Select Operator expressions: VALUE._col0 (type: string) outputColumnNames: _col0 Statistics: Num rows: 2 Data size: 2 Basic stats: COMPLETE Column stats: NONE PTF Operator Function definitions: Input definition input alias: ptf_0 output shape: _col0: string type: WINDOWING Windowing table definition input alias: ptf_1 name: windowingtablefunction order by: 0 ASC NULLS FIRST partition by: 0 raw input shape: window functions: window function definition alias: count_window_0 name: count window function: GenericUDAFCountEvaluator window frame: PRECEDING(MAX)~FOLLOWING(MAX) isStar: true Statistics: Num rows: 2 Data size: 2 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: _col0 is not null (type: boolean) Statistics: Num rows: 2 Data size: 2 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: _col0 (type: string), count_window_0 (type: bigint) outputColumnNames: _col0, _col1 Statistics: Num rows: 2 Data size: 2 Basic stats: COMPLETE Column stats: NONE Spark HashTable Sink Operator keys: 0 _col0 (type: string) 1 _col0 (type: string) Reduce Output Operator key expressions: _col0 (type: string) sort order: + Map-reduce partition columns: _col0 (type: string) Statistics: Num rows: 2 Data size: 2 Basic stats: COMPLETE Column stats: NONE value expressions: _col1 (type: bigint) Stage: Stage-1 Spark DagName: chao_20170515133259_de9e0583-da24-4399-afc8-b881dfef0469:8 Vertices: Map 3 Map Operator Tree: TableScan alias: t2 Statistics: Num rows: 1 Data size: 20 Basic stats: COMPLETE Column stats: NONE Lateral View Forward Statistics: Num rows: 1 Data size: 20 Basic stats: COMPLETE Column stats: NONE Select Operator Statistics: Num rows: 1 Data size: 20 Basic stats: COMPLETE Column stats: NONE Lateral View Join Operator outputColumnNames: _col4 Statistics: Num rows: 2 Data size: 40 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: _col4 (type: string) outputColumnNames: _col0 Statistics: Num rows: 2 Data size: 40 Basic stats: COMPLETE Column stats: NONE Map Join Operator condition map: Inner Join 0 to 1 keys: 0 _col0 (type: string) 1 _col0 (type: string) outputColumnNames: _col1 input vertices: 0 Reducer 2 Statistics: Num rows: 2 Data size: 2 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: _col1 (type: bigint) outputColumnNames: _col0 Statistics: Num rows: 2 Data size: 2 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false Statistics: Num rows: 2 Data size: 2 Basic stats: COMPLETE Column stats: NONE table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe Select Operator expressions: a (type: array<string>) outputColumnNames: _col0 Statistics: Num rows: 1 Data size: 20 Basic stats: COMPLETE Column stats: NONE UDTF Operator Statistics: Num rows: 1 Data size: 20 Basic stats: COMPLETE Column stats: NONE function name: explode outer lateral view: true Filter Operator predicate: col is not null (type: boolean) Statistics: Num rows: 1 Data size: 20 Basic stats: COMPLETE Column stats: NONE Lateral View Join Operator outputColumnNames: _col4 Statistics: Num rows: 2 Data size: 40 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: _col4 (type: string) outputColumnNames: _col0 Statistics: Num rows: 2 Data size: 40 Basic stats: COMPLETE Column stats: NONE Map Join Operator condition map: Inner Join 0 to 1 keys: 0 _col0 (type: string) 1 _col0 (type: string) outputColumnNames: _col1 input vertices: 0 Reducer 2 Statistics: Num rows: 2 Data size: 2 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: _col1 (type: bigint) outputColumnNames: _col0 Statistics: Num rows: 2 Data size: 2 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false Statistics: Num rows: 2 Data size: 2 Basic stats: COMPLETE Column stats: NONE table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe Local Work: Map Reduce Local Work Stage: Stage-0 Fetch Operator limit: -1 Processor Tree: ListSink {code} Note that there're two {{Map 1}} s as inputs for {{Reduce 2}}. The result for this query is: {code} 4 4 4 4 {code} for Hive on Spark, which is not correct. -- This message was sent by Atlassian JIRA (v6.3.15#6346)