JoneZhang created HIVE-12736: -------------------------------- Summary: It seems that result of Hive on Spark is mistake And result of Hive and Hive on Spark are not the same Key: HIVE-12736 URL: https://issues.apache.org/jira/browse/HIVE-12736 Project: Hive Issue Type: Bug Affects Versions: 1.2.1, 1.1.1 Reporter: JoneZhang
select * from staff; 1 jone 22 1 2 lucy 21 1 3 hmm 22 2 4 james 24 3 5 xiaoliu 23 3 select id,date_ from trade union all select id,"test" from trade ; 1 201510210908 2 201509080234 2 201509080235 1 test 2 test 2 test set hive.execution.engine=spark; set spark.master=local; select /*+mapjoin(t)*/ * from staff s join (select id,date_ from trade union all select id,"test" from trade ) t on s.id=t.id; 1 jone 22 1 1 201510210908 2 lucy 21 1 2 201509080234 2 lucy 21 1 2 201509080235 set hive.execution.engine=mr; select /*+mapjoin(t)*/ * from staff s join (select id,date_ from trade union all select id,"test" from trade ) t on s.id=t.id; FAILED: SemanticException [Error 10227]: Not all clauses are supported with mapjoin hint. Please remove mapjoin hint. I have two questions 1.Why result of hive on spark not include the following record? 1 jone 22 1 1 test 2 lucy 21 1 2 test 2 lucy 21 1 2 test 2.Why there are two different ways of dealing same query? explain 1: set hive.execution.engine=spark; set spark.master=local; explain select id,date_ from trade union all select id,"test" from trade; OK STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Spark DagName: jonezhang_20151222191643_5301d90a-caf0-4934-8092-d165c87a4190:1 Vertices: Map 1 Map Operator Tree: TableScan alias: trade Statistics: Num rows: 6 Data size: 48 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: id (type: int), date_ (type: string) outputColumnNames: _col0, _col1 Statistics: Num rows: 6 Data size: 48 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false Statistics: Num rows: 12 Data size: 96 Basic stats: COMPLETE Column stats: NONE table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe Map 2 Map Operator Tree: TableScan alias: trade Statistics: Num rows: 6 Data size: 48 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: id (type: int), 'test' (type: string) outputColumnNames: _col0, _col1 Statistics: Num rows: 6 Data size: 48 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false Statistics: Num rows: 12 Data size: 96 Basic stats: COMPLETE Column stats: NONE table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe Stage: Stage-0 Fetch Operator limit: -1 Processor Tree: ListSink explain 2: set hive.execution.engine=spark; set spark.master=local; explain select /*+mapjoin(t)*/ * from staff s join (select id,date_ from trade union all select id,"test" from trade ) t on s.id=t.id; OK STAGE DEPENDENCIES: Stage-2 is a root stage Stage-1 depends on stages: Stage-2 Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-2 Spark DagName: jonezhang_20151222191716_be7eac84-b5b6-4478-b88f-9f59e2b1b1a8:3 Vertices: Map 1 Map Operator Tree: TableScan alias: trade Statistics: Num rows: 6 Data size: 48 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: id is not null (type: boolean) Statistics: Num rows: 3 Data size: 24 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: id (type: int), date_ (type: string) outputColumnNames: _col0, _col1 Statistics: Num rows: 3 Data size: 24 Basic stats: COMPLETE Column stats: NONE Spark HashTable Sink Operator keys: 0 id (type: int) 1 _col0 (type: int) Local Work: Map Reduce Local Work Stage: Stage-1 Spark DagName: jonezhang_20151222191716_be7eac84-b5b6-4478-b88f-9f59e2b1b1a8:2 Vertices: Map 2 Map Operator Tree: TableScan alias: s Statistics: Num rows: 1 Data size: 66 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: id is not null (type: boolean) Statistics: Num rows: 1 Data size: 66 Basic stats: COMPLETE Column stats: NONE Map Join Operator condition map: Inner Join 0 to 1 keys: 0 id (type: int) 1 _col0 (type: int) outputColumnNames: _col0, _col1, _col2, _col3, _col7, _col8 input vertices: 1 Map 1 Statistics: Num rows: 6 Data size: 52 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: _col0 (type: int), _col1 (type: string), _col2 (type: int), _col3 (type: int), _col7 (type: int), _col8 (type: string) outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5 Statistics: Num rows: 6 Data size: 52 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false Statistics: Num rows: 6 Data size: 52 Basic stats: COMPLETE Column stats: NONE table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe Local Work: Map Reduce Local Work Stage: Stage-0 Fetch Operator limit: -1 Processor Tree: ListSink I can't find any information about union "test" in explain 2. Some properties on hive-site.xml is <property> <name>hive.ignore.mapjoin.hint</name> <value>false</value> </property> <property> <name>hive.auto.convert.join</name> <value>true</value> </property> <property> <name>hive.auto.convert.join.noconditionaltask</name> <value>true</value> -- This message was sent by Atlassian JIRA (v6.3.4#6332)