[ https://issues.apache.org/jira/browse/HIVE-15327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15711903#comment-15711903 ]
Hive QA commented on HIVE-15327: -------------------------------- Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12841276/HIVE-15327.patch {color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 10756 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.org.apache.hadoop.hive.cli.TestCliDriver (batchId=50) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[join_emit_interval] (batchId=9) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample2] (batchId=5) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample4] (batchId=15) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample6] (batchId=60) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample7] (batchId=60) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample9] (batchId=38) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_schema_evol_3a] (batchId=134) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] (batchId=134) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2358/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2358/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2358/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 9 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12841276 - PreCommit-HIVE-Build > Outerjoin might produce wrong result depending on joinEmitInterval value > ------------------------------------------------------------------------ > > Key: HIVE-15327 > URL: https://issues.apache.org/jira/browse/HIVE-15327 > Project: Hive > Issue Type: Bug > Components: Query Processor > Affects Versions: 1.3.0, 2.2.0 > Reporter: Jesus Camacho Rodriguez > Assignee: Jesus Camacho Rodriguez > Priority: Critical > Attachments: HIVE-15327.patch > > > If joinEmitInterval is smaller than the group size, outerjoins might produce > records with NULL appended values multiple times (once per group). > HIVE-4689 targeted the same problem. However, the fix does not seem to cover > all cases (in particular, it will not apply to left outer joins with filter > conditions on the left input). The solution in HIVE-4689 was to disable > (override) joinEmitInterval value for those cases. This fix follows the same > approach. > To reproduce the problem: > {code} > set hive.strict.checks.cartesian.product=false; > set hive.join.emit.interval=1; > CREATE TABLE test1 (key INT, value INT, col_1 STRING); > INSERT INTO test1 VALUES (99, 0, 'Alice'); > INSERT INTO test1 VALUES (99, 2, 'Mat'); > INSERT INTO test1 VALUES (100, 1, 'Bob'); > INSERT INTO test1 VALUES (101, 2, 'Car'); > CREATE TABLE test2 (key INT, value INT, col_2 STRING); > INSERT INTO test2 VALUES (102, 2, 'Del'); > INSERT INTO test2 VALUES (103, 2, 'Ema'); > INSERT INTO test2 VALUES (104, 3, 'Fli'); > -- Equi-condition and condition on one input (left outer join) > SELECT * > FROM test1 LEFT OUTER JOIN test2 > ON (test1.value=test2.value AND test1.key between 100 and 102) > LIMIT 10; > -- Condition on one input (left outer join) > SELECT * > FROM test1 LEFT OUTER JOIN test2 > ON (test1.key between 100 and 102) > LIMIT 10; > {code} > For the *first* query, current (incorrect) result is: > {noformat} > 99 0 Alice NULL NULL NULL > 100 1 Bob NULL NULL NULL > 101 2 Car 103 2 Ema > 99 2 Mat NULL NULL NULL > 101 2 Car 102 2 Del > 99 2 Mat NULL NULL NULL > {noformat} > Expected (correct) result is: > {noformat} > 99 0 Alice NULL NULL NULL > 100 1 Bob NULL NULL NULL > 101 2 Car 103 2 Ema > 101 2 Car 102 2 Del > 99 2 Mat NULL NULL NULL > {noformat} > For the *second* query, current (incorrect) result is: > {noformat} > 101 2 Car 104 3 Fli > 100 1 Bob 104 3 Fli > 99 2 Mat NULL NULL NULL > 99 0 Alice NULL NULL NULL > 101 2 Car 103 2 Ema > 100 1 Bob 103 2 Ema > 99 2 Mat NULL NULL NULL > 99 0 Alice NULL NULL NULL > 101 2 Car 102 2 Del > 100 1 Bob 102 2 Del > {noformat} > Expected (correct) result is: > {noformat} > 101 2 Car 104 3 Fli > 101 2 Car 103 2 Ema > 101 2 Car 102 2 Del > 100 1 Bob 104 3 Fli > 100 1 Bob 103 2 Ema > 100 1 Bob 102 2 Del > 99 2 Mat NULL NULL NULL > 99 0 Alice NULL NULL NULL > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)