[ 
https://issues.apache.org/jira/browse/HIVE-27226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Seonggon Namgung updated HIVE-27226:
------------------------------------
    Description: 
I tested many OuterJoin queries as an extension of HIVE-27138, and I found that 
Hive returns incorrect result for a query containing FullOuterJoin with filter 
expressions. In a nutshell, all JoinOperators that run on Tez engine return 
incorrect result for OuterJoin queries, and one of the reason for incorrect 
computation comes from CommonJoinOperator, which is the base of all 
JoinOperators. I attached the queries and configuration that I used at the 
bottom of the document. I am still inspecting this problems, and I will share 
an update once when I find out another reason. Also any comments and opinions 
would be appreciated.

First of all, I observed that current Hive ignores filter expressions contained 
in MapJoinOperator. For example, the attached result of query1 shows that 
MapJoinOperator performs inner join, not full outer join. This problem stems 
from removal of filterMap. When converting JoinOperator to MapJoinOperator, 
ConvertJoinMapJoin#convertJoinDynamicPartitionedHashJoin() removes filterMap of 
MapJoinOperator. Because MapJoinOperator does not evaluate filter expressions 
if filterMap is null, this change makes MapJoinOperator ignore filter 
expressions and it always joins tables regardless whether they satisfy filter 
expressions or not. To solve this problem, I disable 
FullOuterMapJoinOptimization and apply path for HIVE-27138, which prevents NPE. 
(The patch is available at the following link: LINK.) The rest of this document 
uses this modified Hive, but most of problems happen to current Hive, too.

The second problem I found is that Hive returns the same left-null or 
right-null rows multiple time when it uses MapJoinOperator or 
CommonMergeJoinOperator. This is caused by the logic of current 
CommonJoinOperator. Both of the two JoinOperators joins tables in 2 steps. 
First, they create RowContainers, each of which is a group of rows from one 
table and has the same key. Second, they call 
CommonJoinOperator#checkAndGenObject() with created RowContainers. This method 
checks filterTag of each row in RowContainers and forwards joined row if they 
meet all filter conditions. For OuterJoin, checkAndGenObject() forwards 
non-matching rows if there is no matching row in RowContainer. The problem 
happens when there are multiple RowContainer for the same key and table. For 
example, suppose that there are two left RowContainers and one right 
RowContainer. If none of the row in two left RowContainers satisfies filter 
condition, then checkAndGenObject() will forward Left-Null row for each right 
row. Because checkAndGenObject() is called with each left RowContainer, there 
will be two duplicated Left-Null rows for every right row.

In the case of MapJoinOperator, it always creates singleton RowContainer for 
big table. Therefore, it always produces duplicated non-matching rows. 
CommonMergeJoinOperator also creates multiple RowContainer for big table, whose 
size is hive.join.emit.interval. In the below experiment, I also set 
hive.join.shortcut.unmatched.rows=false, and hive.exec.reducers.max=1 to 
disable specialized algorithm for OuterJoin of 2 tables and force calling 
checkAndGenObject() before all rows with the same keys are gathered. I didn't 
observe this problem when using VectorMapJoinOperator, and I will inspect 
VectorMapJoinOperator whether we can reproduce the problem with it.

I think the second problem is not limited to FullOuterJoin, but I couldn't find 
such query as of now. This will also be added to this issue if I can write a 
query that reproduces the second problem without FullOuterJoin.

I also found that Hive returns wrong result for query2 even when I used 
VectorMapJoinOperator. I am still inspecting this problem and I will add an 
update on it when I find out the reason.

 

Experiment:

 
{code:java}
---- Configuration
set hive.optimize.shared.work=false;

-- Std MapJoin
set hive.auto.convert.join=true;
set hive.vectorized.execution.enabled=false;

-- Vec MapJoin
set hive.auto.convert.join=true;
set hive.vectorized.execution.enabled=true;

-- MergeJoin
set hive.auto.convert.join=false;
set hive.vectorized.execution.enabled=false;
set hive.join.shortcut.unmatched.rows=false;
set hive.join.emit.interval=1;
set hive.exec.reducers.max=1;
 
---- Queries
-- Query 1
DROP TABLE IF EXISTS a;
CREATE TABLE a (key string, value string);
INSERT INTO a VALUES (1, 1), (1, 2), (2, 1);
SELECT * FROM a FULL OUTER JOIN a b ON a.key = b.key AND a.key < 0;

-- Query 2
DROP TABLE IF EXISTS b;
CREATE TABLE b (key string, value string);
INSERT INTO b VALUES (1, 0), (1, 1);
SELECT * FROM b FULL OUTER JOIN b a ON a.key = b.key AND a.value > 0 AND 
b.value > 0;{code}
 

 

Experiment result:

 
{code:java}
-- PostgresSQL
-- Query1
key | value | key | value
-----+-------+-----+-------
   1 |     1 |     |
   1 |     2 |     |
   2 |     1 |     |
     |       |   1 |     2
     |       |   1 |     1
     |       |   2 |     1
(6 rows)

-- Query2
 key | value | key | value
-----+-------+-----+-------
   1 |     0 |     |      
   1 |     1 |   1 |     1
     |       |   1 |     0
(3 rows){code}
{code:java}
-- Query1 Result, current Hive
-- Std MapJoin
+--------+----------+--------+----------+
| a.key  | a.value  | b.key  | b.value  |
+--------+----------+--------+----------+
| 2      | 1        | 2      | 1        |
| 1      | 2        | 1      | 2        |
| 1      | 2        | 1      | 1        |
| 1      | 1        | 1      | 2        |
| 1      | 1        | 1      | 1        |
+--------+----------+--------+----------+
-- Vec MapJoin
+--------+----------+--------+----------+
| a.key  | a.value  | b.key  | b.value  |
+--------+----------+--------+----------+
| 1      | 2        | NULL   | NULL     |
| 1      | 1        | NULL   | NULL     |
| NULL   | NULL     | 1      | 2        |
| NULL   | NULL     | 1      | 1        |
| 2      | 1        | NULL   | NULL     |
| NULL   | NULL     | 2      | 1        |
+--------+----------+--------+----------+
-- MergeJoin
+--------+----------+--------+----------+
| a.key  | a.value  | b.key  | b.value  |
+--------+----------+--------+----------+
| 1      | 1        | NULL   | NULL     |
| NULL   | NULL     | 1      | 1        |
| NULL   | NULL     | 1      | 2        |
| 1      | 2        | NULL   | NULL     |
| NULL   | NULL     | 1      | 1        |
| NULL   | NULL     | 1      | 2        |
| 2      | 1        | NULL   | NULL     |
| NULL   | NULL     | 2      | 1        |
+--------+----------+--------+----------+ {code}
{code:java}
-- Query1 Result, Hive with HIVE-27138 patch, disable 
FullOuterMapJoinOptimization
-- Std MapJoin
+--------+----------+--------+----------+
| a.key  | a.value  | b.key  | b.value  |
+--------+----------+--------+----------+
| 2      | 1        | NULL   | NULL     |
| NULL   | NULL     | 2      | 1        |
| 1      | 2        | NULL   | NULL     |
| NULL   | NULL     | 1      | 2        |
| NULL   | NULL     | 1      | 1        |
| 1      | 1        | NULL   | NULL     |
| NULL   | NULL     | 1      | 2        |
| NULL   | NULL     | 1      | 1        |
+--------+----------+--------+----------+
-- Vec MapJoin
+--------+----------+--------+----------+
| a.key  | a.value  | b.key  | b.value  |
+--------+----------+--------+----------+
| 1      | 2        | NULL   | NULL     |
| 1      | 1        | NULL   | NULL     |
| NULL   | NULL     | 1      | 2        |
| NULL   | NULL     | 1      | 1        |
| 2      | 1        | NULL   | NULL     |
| NULL   | NULL     | 2      | 1        |
+--------+----------+--------+----------+
-- MergeJoin
+--------+----------+--------+----------+
| a.key  | a.value  | b.key  | b.value  |
+--------+----------+--------+----------+
| 1      | 1        | NULL   | NULL     |
| NULL   | NULL     | 1      | 1        |
| NULL   | NULL     | 1      | 2        |
| 1      | 2        | NULL   | NULL     |
| NULL   | NULL     | 1      | 1        |
| NULL   | NULL     | 1      | 2        |
| 2      | 1        | NULL   | NULL     |
| NULL   | NULL     | 2      | 1        |
+--------+----------+--------+----------+ {code}
{code:java}
-- Query2 Result, current Hive
-- Std MapJoin
+--------+----------+--------+----------+
| b.key  | b.value  | a.key  | a.value  |
+--------+----------+--------+----------+
| 1      | 0        | NULL   | NULL     |
| NULL   | NULL     | 1      | 0        |
| NULL   | NULL     | 1      | 1        |
| 1      | 1        | 1      | 0        |
| 1      | 1        | 1      | 1        |
+--------+----------+--------+----------+
-- Vec MapJoin
+--------+----------+--------+----------+
| b.key  | b.value  | a.key  | a.value  |
+--------+----------+--------+----------+
| 1      | 0        | NULL   | NULL     |
| 1      | 1        | 1      | 1        |
| 1      | 1        | 1      | 0        |
+--------+----------+--------+----------+
-- MergeJoin
+--------+----------+--------+----------+
| b.key  | b.value  | a.key  | a.value  |
+--------+----------+--------+----------+
| 1      | 0        | NULL   | NULL     |
| NULL   | NULL     | 1      | 0        |
| 1      | 1        | 1      | 1        |
+--------+----------+--------+----------+ {code}
{code:java}
-- Query2 Result, Hive with HIVE-27138 patch, disable 
FullOuterMapJoinOptimization
-- Std MapJoin
+--------+----------+--------+----------+
| b.key  | b.value  | a.key  | a.value  |
+--------+----------+--------+----------+
| 1      | 1        | 1      | 1        |
| 1      | 1        | 1      | 0        |
| 1      | 0        | 1      | 1        |
| 1      | 0        | 1      | 0        |
+--------+----------+--------+----------+
-- Vec MapJoin
+--------+----------+--------+----------+
| b.key  | b.value  | a.key  | a.value  |
+--------+----------+--------+----------+
| 1      | 0        | NULL   | NULL     |
| 1      | 1        | 1      | 0        |
| 1      | 1        | 1      | 1        |
+--------+----------+--------+----------+
-- MergeJoin
+--------+----------+--------+----------+
| b.key  | b.value  | a.key  | a.value  |
+--------+----------+--------+----------+
| 1      | 0        | NULL   | NULL     |
| NULL   | NULL     | 1      | 0        |
| 1      | 1        | 1      | 1        |
+--------+----------+--------+----------+ {code}
 

  was:
I tested many OuterJoin queries as an extension of HIVE-27138, and I found that 
Hive returns incorrect result for a query containing FullOuterJoin with filter 
expressions. In a nutshell, all JoinOperators that run on Tez engine return 
incorrect result for OuterJoin queries, and one of the reason for incorrect 
computation comes from CommonJoinOperator, which is the base of all 
JoinOperators. I attached the queries and configuration that I used at the 
bottom of the document. I am still inspecting this problems, and I will share 
an update once when I find out another reason. Also any comments and opinions 
would be appreciated.


First of all, I observed that current Hive ignores filter expressions contained 
in MapJoinOperator. For example, the attached result of query1 shows that 
MapJoinOperator performs inner join, not full outer join. This problem stems 
from removal of filterMap. When converting JoinOperator to MapJoinOperator, 
ConvertJoinMapJoin#convertJoinDynamicPartitionedHashJoin() removes filterMap of 
MapJoinOperator. Because MapJoinOperator does not evaluate filter expressions 
if filterMap is null, this change makes MapJoinOperator ignore filter 
expressions and it always joins tables regardless whether they satisfy filter 
expressions or not. To solve this problem, I disable 
FullOuterMapJoinOptimization and apply path for HIVE-27138, which prevents NPE. 
(The patch is available at the following link: LINK.) The rest of this document 
uses this modified Hive, but most of problems happen to current Hive, too.


The second problem I found is that Hive returns the same left-null or 
right-null rows multiple time when it uses MapJoinOperator or 
CommonMergeJoinOperator. This is caused by the logic of current 
CommonJoinOperator. Both of the two JoinOperators joins tables in 2 steps. 
First, they create RowContainers, each of which is a group of rows from one 
table and has the same key. Second, they call 
CommonJoinOperator#checkAndGenObject() with created RowContainers. This method 
checks filterTag of each row in RowContainers and forwards joined row if they 
meet all filter conditions. For OuterJoin, checkAndGenObject() forwards 
non-matching rows if there is no matching row in RowContainer. The problem 
happens when there are multiple RowContainer for the same key and table. For 
example, suppose that there are two left RowContainers and one right 
RowContainer. If none of the row in two left RowContainers satisfies filter 
condition, then checkAndGenObject() will forward Left-Null row for each right 
row. Because checkAndGenObject() is called with each left RowContainer, there 
will be two duplicated Left-Null rows for every right row.


In the case of MapJoinOperator, it always creates singleton RowContainer for 
big table. Therefore, it always produces duplicated non-matching rows. 
CommonMergeJoinOperator also creates multiple RowContainer for big table, whose 
size is hive.join.emit.interval. In the below experiment, I also set 
hive.join.shortcut.unmatched.rows=false, and hive.exec.reducers.max=1 to 
disable specialized algorithm for OuterJoin of 2 tables and force calling 
checkAndGenObject() before all rows with the same keys are gathered. I didn't 
observe this problem when using VectorMapJoinOperator, and I will inspect 
VectorMapJoinOperator whether we can reproduce the problem with it.


I think the second problem is not limited to FullOuterJoin, but I couldn't find 
such query as of now. This will also be added to this issue if I can write a 
query that reproduces the second problem without FullOuterJoin.


I also found that Hive returns wrong result for query2 even when I used 
VectorMapJoinOperator. I am still inspecting this problem and I will add an 
update on it when I find out the reason.

 

Experiment:

 
{code:java}
---- Configuration
set hive.optimize.shared.work=false;

-- Std MapJoin
set hive.auto.convert.join=true;
set hive.vectorized.execution.enabled=false;

-- Vec MapJoin
set hive.auto.convert.join=true;
set hive.vectorized.execution.enabled=true;

-- MergeJoin
set hive.auto.convert.join=false;
set hive.vectorized.execution.enabled=false;
set hive.join.shortcut.unmatched.rows=false;
set hive.join.emit.interval=1;
set hive.exec.reducers.max=1;
 
---- Queries
-- Query 1
DROP TABLE IF EXISTS a;
CREATE TABLE a (key string, value string);
INSERT INTO a VALUES (1, 1), (1, 2), (2, 1);
SELECT * FROM a FULL OUTER JOIN a b ON a.key = b.key AND a.key < 0;

-- Query 2
DROP TABLE IF EXISTS b;
CREATE TABLE b (key string, value string);
INSERT INTO b VALUES (1, 0), (1, 1);
SELECT * FROM b FULL OUTER JOIN b a ON a.key = b.key AND a.value > 0 AND 
b.value > 0;{code}
 

 

Experiment result:

 
{code:java}
-- PostgresSQL
-- Query1
key | value | key | value
-----+-------+-----+-------
   1 |     1 |     |
   1 |     2 |     |
   2 |     1 |     |
     |       |   1 |     2
     |       |   1 |     1
     |       |   2 |     1
(6 rows)

-- Query2
 key | value | key | value
-----+-------+-----+-------
   1 |     0 |     |      
   1 |     1 |   1 |     1
     |       |   1 |     0
(3 rows){code}
 


> FullOuterJoin with filter expressions is not computed correctly
> ---------------------------------------------------------------
>
>                 Key: HIVE-27226
>                 URL: https://issues.apache.org/jira/browse/HIVE-27226
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Seonggon Namgung
>            Priority: Major
>
> I tested many OuterJoin queries as an extension of HIVE-27138, and I found 
> that Hive returns incorrect result for a query containing FullOuterJoin with 
> filter expressions. In a nutshell, all JoinOperators that run on Tez engine 
> return incorrect result for OuterJoin queries, and one of the reason for 
> incorrect computation comes from CommonJoinOperator, which is the base of all 
> JoinOperators. I attached the queries and configuration that I used at the 
> bottom of the document. I am still inspecting this problems, and I will share 
> an update once when I find out another reason. Also any comments and opinions 
> would be appreciated.
> First of all, I observed that current Hive ignores filter expressions 
> contained in MapJoinOperator. For example, the attached result of query1 
> shows that MapJoinOperator performs inner join, not full outer join. This 
> problem stems from removal of filterMap. When converting JoinOperator to 
> MapJoinOperator, ConvertJoinMapJoin#convertJoinDynamicPartitionedHashJoin() 
> removes filterMap of MapJoinOperator. Because MapJoinOperator does not 
> evaluate filter expressions if filterMap is null, this change makes 
> MapJoinOperator ignore filter expressions and it always joins tables 
> regardless whether they satisfy filter expressions or not. To solve this 
> problem, I disable FullOuterMapJoinOptimization and apply path for 
> HIVE-27138, which prevents NPE. (The patch is available at the following 
> link: LINK.) The rest of this document uses this modified Hive, but most of 
> problems happen to current Hive, too.
> The second problem I found is that Hive returns the same left-null or 
> right-null rows multiple time when it uses MapJoinOperator or 
> CommonMergeJoinOperator. This is caused by the logic of current 
> CommonJoinOperator. Both of the two JoinOperators joins tables in 2 steps. 
> First, they create RowContainers, each of which is a group of rows from one 
> table and has the same key. Second, they call 
> CommonJoinOperator#checkAndGenObject() with created RowContainers. This 
> method checks filterTag of each row in RowContainers and forwards joined row 
> if they meet all filter conditions. For OuterJoin, checkAndGenObject() 
> forwards non-matching rows if there is no matching row in RowContainer. The 
> problem happens when there are multiple RowContainer for the same key and 
> table. For example, suppose that there are two left RowContainers and one 
> right RowContainer. If none of the row in two left RowContainers satisfies 
> filter condition, then checkAndGenObject() will forward Left-Null row for 
> each right row. Because checkAndGenObject() is called with each left 
> RowContainer, there will be two duplicated Left-Null rows for every right row.
> In the case of MapJoinOperator, it always creates singleton RowContainer for 
> big table. Therefore, it always produces duplicated non-matching rows. 
> CommonMergeJoinOperator also creates multiple RowContainer for big table, 
> whose size is hive.join.emit.interval. In the below experiment, I also set 
> hive.join.shortcut.unmatched.rows=false, and hive.exec.reducers.max=1 to 
> disable specialized algorithm for OuterJoin of 2 tables and force calling 
> checkAndGenObject() before all rows with the same keys are gathered. I didn't 
> observe this problem when using VectorMapJoinOperator, and I will inspect 
> VectorMapJoinOperator whether we can reproduce the problem with it.
> I think the second problem is not limited to FullOuterJoin, but I couldn't 
> find such query as of now. This will also be added to this issue if I can 
> write a query that reproduces the second problem without FullOuterJoin.
> I also found that Hive returns wrong result for query2 even when I used 
> VectorMapJoinOperator. I am still inspecting this problem and I will add an 
> update on it when I find out the reason.
>  
> Experiment:
>  
> {code:java}
> ---- Configuration
> set hive.optimize.shared.work=false;
> -- Std MapJoin
> set hive.auto.convert.join=true;
> set hive.vectorized.execution.enabled=false;
> -- Vec MapJoin
> set hive.auto.convert.join=true;
> set hive.vectorized.execution.enabled=true;
> -- MergeJoin
> set hive.auto.convert.join=false;
> set hive.vectorized.execution.enabled=false;
> set hive.join.shortcut.unmatched.rows=false;
> set hive.join.emit.interval=1;
> set hive.exec.reducers.max=1;
>  
> ---- Queries
> -- Query 1
> DROP TABLE IF EXISTS a;
> CREATE TABLE a (key string, value string);
> INSERT INTO a VALUES (1, 1), (1, 2), (2, 1);
> SELECT * FROM a FULL OUTER JOIN a b ON a.key = b.key AND a.key < 0;
> -- Query 2
> DROP TABLE IF EXISTS b;
> CREATE TABLE b (key string, value string);
> INSERT INTO b VALUES (1, 0), (1, 1);
> SELECT * FROM b FULL OUTER JOIN b a ON a.key = b.key AND a.value > 0 AND 
> b.value > 0;{code}
>  
>  
> Experiment result:
>  
> {code:java}
> -- PostgresSQL
> -- Query1
> key | value | key | value
> -----+-------+-----+-------
>    1 |     1 |     |
>    1 |     2 |     |
>    2 |     1 |     |
>      |       |   1 |     2
>      |       |   1 |     1
>      |       |   2 |     1
> (6 rows)
> -- Query2
>  key | value | key | value
> -----+-------+-----+-------
>    1 |     0 |     |      
>    1 |     1 |   1 |     1
>      |       |   1 |     0
> (3 rows){code}
> {code:java}
> -- Query1 Result, current Hive
> -- Std MapJoin
> +--------+----------+--------+----------+
> | a.key  | a.value  | b.key  | b.value  |
> +--------+----------+--------+----------+
> | 2      | 1        | 2      | 1        |
> | 1      | 2        | 1      | 2        |
> | 1      | 2        | 1      | 1        |
> | 1      | 1        | 1      | 2        |
> | 1      | 1        | 1      | 1        |
> +--------+----------+--------+----------+
> -- Vec MapJoin
> +--------+----------+--------+----------+
> | a.key  | a.value  | b.key  | b.value  |
> +--------+----------+--------+----------+
> | 1      | 2        | NULL   | NULL     |
> | 1      | 1        | NULL   | NULL     |
> | NULL   | NULL     | 1      | 2        |
> | NULL   | NULL     | 1      | 1        |
> | 2      | 1        | NULL   | NULL     |
> | NULL   | NULL     | 2      | 1        |
> +--------+----------+--------+----------+
> -- MergeJoin
> +--------+----------+--------+----------+
> | a.key  | a.value  | b.key  | b.value  |
> +--------+----------+--------+----------+
> | 1      | 1        | NULL   | NULL     |
> | NULL   | NULL     | 1      | 1        |
> | NULL   | NULL     | 1      | 2        |
> | 1      | 2        | NULL   | NULL     |
> | NULL   | NULL     | 1      | 1        |
> | NULL   | NULL     | 1      | 2        |
> | 2      | 1        | NULL   | NULL     |
> | NULL   | NULL     | 2      | 1        |
> +--------+----------+--------+----------+ {code}
> {code:java}
> -- Query1 Result, Hive with HIVE-27138 patch, disable 
> FullOuterMapJoinOptimization
> -- Std MapJoin
> +--------+----------+--------+----------+
> | a.key  | a.value  | b.key  | b.value  |
> +--------+----------+--------+----------+
> | 2      | 1        | NULL   | NULL     |
> | NULL   | NULL     | 2      | 1        |
> | 1      | 2        | NULL   | NULL     |
> | NULL   | NULL     | 1      | 2        |
> | NULL   | NULL     | 1      | 1        |
> | 1      | 1        | NULL   | NULL     |
> | NULL   | NULL     | 1      | 2        |
> | NULL   | NULL     | 1      | 1        |
> +--------+----------+--------+----------+
> -- Vec MapJoin
> +--------+----------+--------+----------+
> | a.key  | a.value  | b.key  | b.value  |
> +--------+----------+--------+----------+
> | 1      | 2        | NULL   | NULL     |
> | 1      | 1        | NULL   | NULL     |
> | NULL   | NULL     | 1      | 2        |
> | NULL   | NULL     | 1      | 1        |
> | 2      | 1        | NULL   | NULL     |
> | NULL   | NULL     | 2      | 1        |
> +--------+----------+--------+----------+
> -- MergeJoin
> +--------+----------+--------+----------+
> | a.key  | a.value  | b.key  | b.value  |
> +--------+----------+--------+----------+
> | 1      | 1        | NULL   | NULL     |
> | NULL   | NULL     | 1      | 1        |
> | NULL   | NULL     | 1      | 2        |
> | 1      | 2        | NULL   | NULL     |
> | NULL   | NULL     | 1      | 1        |
> | NULL   | NULL     | 1      | 2        |
> | 2      | 1        | NULL   | NULL     |
> | NULL   | NULL     | 2      | 1        |
> +--------+----------+--------+----------+ {code}
> {code:java}
> -- Query2 Result, current Hive
> -- Std MapJoin
> +--------+----------+--------+----------+
> | b.key  | b.value  | a.key  | a.value  |
> +--------+----------+--------+----------+
> | 1      | 0        | NULL   | NULL     |
> | NULL   | NULL     | 1      | 0        |
> | NULL   | NULL     | 1      | 1        |
> | 1      | 1        | 1      | 0        |
> | 1      | 1        | 1      | 1        |
> +--------+----------+--------+----------+
> -- Vec MapJoin
> +--------+----------+--------+----------+
> | b.key  | b.value  | a.key  | a.value  |
> +--------+----------+--------+----------+
> | 1      | 0        | NULL   | NULL     |
> | 1      | 1        | 1      | 1        |
> | 1      | 1        | 1      | 0        |
> +--------+----------+--------+----------+
> -- MergeJoin
> +--------+----------+--------+----------+
> | b.key  | b.value  | a.key  | a.value  |
> +--------+----------+--------+----------+
> | 1      | 0        | NULL   | NULL     |
> | NULL   | NULL     | 1      | 0        |
> | 1      | 1        | 1      | 1        |
> +--------+----------+--------+----------+ {code}
> {code:java}
> -- Query2 Result, Hive with HIVE-27138 patch, disable 
> FullOuterMapJoinOptimization
> -- Std MapJoin
> +--------+----------+--------+----------+
> | b.key  | b.value  | a.key  | a.value  |
> +--------+----------+--------+----------+
> | 1      | 1        | 1      | 1        |
> | 1      | 1        | 1      | 0        |
> | 1      | 0        | 1      | 1        |
> | 1      | 0        | 1      | 0        |
> +--------+----------+--------+----------+
> -- Vec MapJoin
> +--------+----------+--------+----------+
> | b.key  | b.value  | a.key  | a.value  |
> +--------+----------+--------+----------+
> | 1      | 0        | NULL   | NULL     |
> | 1      | 1        | 1      | 0        |
> | 1      | 1        | 1      | 1        |
> +--------+----------+--------+----------+
> -- MergeJoin
> +--------+----------+--------+----------+
> | b.key  | b.value  | a.key  | a.value  |
> +--------+----------+--------+----------+
> | 1      | 0        | NULL   | NULL     |
> | NULL   | NULL     | 1      | 0        |
> | 1      | 1        | 1      | 1        |
> +--------+----------+--------+----------+ {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to