[ https://issues.apache.org/jira/browse/HIVE-15327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15711771#comment-15711771 ]
Jesus Camacho Rodriguez edited comment on HIVE-15327 at 12/1/16 11:51 AM: -------------------------------------------------------------------------- [~ashutoshc], could you review the patch? Issue is not reproducible with Tez join operators implementation, but I thought it was worth to add the additional test cases so we do not regress in the future. was (Author: jcamachorodriguez): [~ashutoshc], could you review the patch? Issue is not reproducible with Tez join operators implementation, but I thought it was worth to add the tests so we do not regress in the future. > Outerjoin might produce wrong result depending on joinEmitInterval value > ------------------------------------------------------------------------ > > Key: HIVE-15327 > URL: https://issues.apache.org/jira/browse/HIVE-15327 > Project: Hive > Issue Type: Bug > Components: Query Processor > Affects Versions: 1.3.0, 2.2.0 > Reporter: Jesus Camacho Rodriguez > Assignee: Jesus Camacho Rodriguez > Priority: Critical > Attachments: HIVE-15327.patch > > > If joinEmitInterval is smaller than the group size, outerjoins might produce > records with NULL appended values multiple times (once per group). > HIVE-4689 targeted the same problem. However, the fix does not seem to cover > all cases (in particular, it will not apply to left outer joins with filter > conditions on the left input). The solution in HIVE-4689 was to disable > (override) joinEmitInterval value for those cases. This fix follows the same > approach. > To reproduce the problem: > {code} > set hive.strict.checks.cartesian.product=false; > set hive.join.emit.interval=1; > CREATE TABLE test1 (key INT, value INT, col_1 STRING); > INSERT INTO test1 VALUES (99, 0, 'Alice'); > INSERT INTO test1 VALUES (99, 2, 'Mat'); > INSERT INTO test1 VALUES (100, 1, 'Bob'); > INSERT INTO test1 VALUES (101, 2, 'Car'); > CREATE TABLE test2 (key INT, value INT, col_2 STRING); > INSERT INTO test2 VALUES (102, 2, 'Del'); > INSERT INTO test2 VALUES (103, 2, 'Ema'); > INSERT INTO test2 VALUES (104, 3, 'Fli'); > -- Equi-condition and condition on one input (left outer join) > SELECT * > FROM test1 LEFT OUTER JOIN test2 > ON (test1.value=test2.value AND test1.key between 100 and 102) > LIMIT 10; > -- Condition on one input (left outer join) > SELECT * > FROM test1 LEFT OUTER JOIN test2 > ON (test1.key between 100 and 102) > LIMIT 10; > {code} > For the *first* query, current (incorrect) result is: > {noformat} > 99 0 Alice NULL NULL NULL > 100 1 Bob NULL NULL NULL > 101 2 Car 103 2 Ema > 99 2 Mat NULL NULL NULL > 101 2 Car 102 2 Del > 99 2 Mat NULL NULL NULL > {noformat} > Expected (correct) result is: > {noformat} > 99 0 Alice NULL NULL NULL > 100 1 Bob NULL NULL NULL > 101 2 Car 103 2 Ema > 101 2 Car 102 2 Del > 99 2 Mat NULL NULL NULL > {noformat} > For the *second* query, current (incorrect) result is: > {noformat} > 101 2 Car 104 3 Fli > 100 1 Bob 104 3 Fli > 99 2 Mat NULL NULL NULL > 99 0 Alice NULL NULL NULL > 101 2 Car 103 2 Ema > 100 1 Bob 103 2 Ema > 99 2 Mat NULL NULL NULL > 99 0 Alice NULL NULL NULL > 101 2 Car 102 2 Del > 100 1 Bob 102 2 Del > {noformat} > Expected (correct) result is: > {noformat} > 101 2 Car 104 3 Fli > 101 2 Car 103 2 Ema > 101 2 Car 102 2 Del > 100 1 Bob 104 3 Fli > 100 1 Bob 103 2 Ema > 100 1 Bob 102 2 Del > 99 2 Mat NULL NULL NULL > 99 0 Alice NULL NULL NULL > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)