[ https://issues.apache.org/jira/browse/HIVE-2520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
binlijin updated HIVE-2520: --------------------------- Status: Patch Available (was: Open) > left semi join will duplicate data > ---------------------------------- > > Key: HIVE-2520 > URL: https://issues.apache.org/jira/browse/HIVE-2520 > Project: Hive > Issue Type: Bug > Affects Versions: 0.7.0 > Reporter: binlijin > Assignee: binlijin > Priority: Critical > Labels: patch > Attachments: hive-2520.2.patch, hive-2520.patch > > > CREATE TABLE sales (name STRING, id INT) > ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'; > CREATE TABLE things (id INT, name STRING) > ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'; > The 'sales' table has data in a file: sales.txt, and the data is: > Joe 2 > Hank 2 > The 'things' table has data int two files: things.txt and things2.txt: > The content of things.txt is : > 2 Tie > The content of things2.txt is : > 2 Tie > SELECT * FROM sales LEFT SEMI JOIN things ON (sales.id = things.id); > will output: > Joe 2 > Joe 2 > Hank 2 > Hank 2 > so the result is wrong. > In CommonJoinOperator left semi join should use " genObject(null, 0, new > IntermediateObject(new ArrayList[numAliases], 0), true); " to generate data. > but now it uses " genUniqueJoinObject(0, 0); " to generate data. > This patch will solve this problem. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira