It seems that result of Hive on Spark is mistake And result of Hive and Hive on Spark are not the same

Jone Zhang Tue, 22 Dec 2015 03:31:53 -0800

*select  * from staff;*
1 jone 22 1
2 lucy 21 1
3 hmm 22 2
4 james 24 3
5 xiaoliu 23 3


*select id,date_ from trade union all select id,"test" from trade ;*
1 201510210908
2 201509080234
2 201509080235
1 test
2 test
2 test

*set hive.execution.engine=spark;*
*set spark.master=local;*
*select /*+mapjoin(t)*/ * from staff s join *
*(select id,date_ from trade union all select id,"test" from trade ) t on
s.id <http://s.id>=t.id <http://t.id>;*
1 jone 22 1 1 201510210908
2 lucy 21 1 2 201509080234
2 lucy 21 1 2 201509080235

*set hive.execution.engine=mr;*
*select /*+mapjoin(t)*/ * from staff s join *
*(select id,date_ from trade union all select id,"test" from trade ) t on
s.id <http://s.id>=t.id <http://t.id>;*
FAILED: SemanticException [Error 10227]: Not all clauses are supported with
mapjoin hint. Please remove mapjoin hint.

*I have two questions*
*1.Why result of hive on spark not include the following record?*
1 jone 22 1 1 test
2 lucy 21 1 2 test
2 lucy 21 1 2 test

*2.Why there are two different ways of dealing same query?*


*explain 1:*
*set hive.execution.engine=spark;*
*set spark.master=local;*
*explain *
*select id,date_ from trade union all select id,"test" from trade;*
OK
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
    Spark
      DagName:
jonezhang_20151222191643_5301d90a-caf0-4934-8092-d165c87a4190:1
      Vertices:
        Map 1
            Map Operator Tree:
                TableScan
                  alias: trade
                  Statistics: Num rows: 6 Data size: 48 Basic stats:
COMPLETE Column stats: NONE
                  Select Operator
                    expressions: id (type: int), date_ (type: string)
                    outputColumnNames: _col0, _col1
                    Statistics: Num rows: 6 Data size: 48 Basic stats:
COMPLETE Column stats: NONE
                    File Output Operator
                      compressed: false
                      Statistics: Num rows: 12 Data size: 96 Basic stats:
COMPLETE Column stats: NONE
                      table:
                          input format:
org.apache.hadoop.mapred.TextInputFormat
                          output format:
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                          serde:
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
        Map 2
            Map Operator Tree:
                TableScan
                  alias: trade
                  Statistics: Num rows: 6 Data size: 48 Basic stats:
COMPLETE Column stats: NONE
                  Select Operator
                    expressions: id (type: int), 'test' (type: string)
                    outputColumnNames: _col0, _col1
                    Statistics: Num rows: 6 Data size: 48 Basic stats:
COMPLETE Column stats: NONE
                    File Output Operator
                      compressed: false
                      Statistics: Num rows: 12 Data size: 96 Basic stats:
COMPLETE Column stats: NONE
                      table:
                          input format:
org.apache.hadoop.mapred.TextInputFormat
                          output format:
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                          serde:
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink


*explain 2:*
*set hive.execution.engine=spark;*
*set spark.master=local;*
*explain *
*select /*+mapjoin(t)*/ * from staff s join *
*(select id,date_ from trade union all select id,"20999999999" from trade )
t on s.id <http://s.id>=t.id <http://t.id>;*
OK
STAGE DEPENDENCIES:
  Stage-2 is a root stage
  Stage-1 depends on stages: Stage-2
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-2
    Spark
      DagName:
jonezhang_20151222191716_be7eac84-b5b6-4478-b88f-9f59e2b1b1a8:3
      Vertices:
        Map 1
            Map Operator Tree:
                TableScan
                  alias: trade
                  Statistics: Num rows: 6 Data size: 48 Basic stats:
COMPLETE Column stats: NONE
                  Filter Operator
                    predicate: id is not null (type: boolean)
                    Statistics: Num rows: 3 Data size: 24 Basic stats:
COMPLETE Column stats: NONE
                    Select Operator
                      expressions: id (type: int), date_ (type: string)
                      outputColumnNames: _col0, _col1
                      Statistics: Num rows: 3 Data size: 24 Basic stats:
COMPLETE Column stats: NONE
                      Spark HashTable Sink Operator
                        keys:
                          0 id (type: int)
                          1 _col0 (type: int)
            Local Work:
              Map Reduce Local Work

  Stage: Stage-1
    Spark
      DagName:
jonezhang_20151222191716_be7eac84-b5b6-4478-b88f-9f59e2b1b1a8:2
      Vertices:
        Map 2
            Map Operator Tree:
                TableScan
                  alias: s
                  Statistics: Num rows: 1 Data size: 66 Basic stats:
COMPLETE Column stats: NONE
                  Filter Operator
                    predicate: id is not null (type: boolean)
                    Statistics: Num rows: 1 Data size: 66 Basic stats:
COMPLETE Column stats: NONE
                    Map Join Operator
                      condition map:
                           Inner Join 0 to 1
                      keys:
                        0 id (type: int)
                        1 _col0 (type: int)
                      outputColumnNames: _col0, _col1, _col2, _col3, _col7,
_col8
                      input vertices:
                        1 Map 1
                      Statistics: Num rows: 6 Data size: 52 Basic stats:
COMPLETE Column stats: NONE
                      Select Operator
                        expressions: _col0 (type: int), _col1 (type:
string), _col2 (type: int), _col3 (type: int), _col7 (type: int), _col8
(type: string)
                        outputColumnNames: _col0, _col1, _col2, _col3,
_col4, _col5
                        Statistics: Num rows: 6 Data size: 52 Basic stats:
COMPLETE Column stats: NONE
                        File Output Operator
                          compressed: false
                          Statistics: Num rows: 6 Data size: 52 Basic
stats: COMPLETE Column stats: NONE
                          table:
                              input format:
org.apache.hadoop.mapred.TextInputFormat
                              output format:
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                              serde:
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
            Local Work:
              Map Reduce Local Work

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink



*I can't find any information about union "test" in explain 2.*

*Some properties on hive-site.xml is *
<property>
<name>hive.ignore.mapjoin.hint</name>
<value>false</value>
</property>
<property>
<name>hive.auto.convert.join</name>
<value>true</value>
</property>
<property>
<name>hive.auto.convert.join.noconditionaltask</name>
<value>true</value>




*Thanks.*
*Best wishes.*

It seems that result of Hive on Spark is mistake And result of Hive and Hive on Spark are not the same

Reply via email to