Rich Haase created HIVE-11132: --------------------------------- Summary: Queries using join and group by produce incorrect output when hive.auto.convert.join=false and hive.optimize.reducededuplication=true Key: HIVE-11132 URL: https://issues.apache.org/jira/browse/HIVE-11132 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Rich Haase
Queries using join and group by produce multiple output rows with the same key when hive.auto.convert.join=false and hive.optimize.reducededuplication=true. This interaction between configuration parameters is unexpected and should be well documented at the very least and should likely be considered a bug. e.g. hive> set hive.auto.convert.join = false; hive> set hive.optimize.reducededuplication = true; hive> SELECT foo.id, count(*) as factor > FROM foo > JOIN bar ON (foo.id = bar.id and foo.line_id = bar.line_id) > JOIN split ON (foo.id = split.id and foo.line_id = split.line_id) > JOIN forecast ON (foo.id = forecast.id AND foo.line_id = forecast.line_id) > WHERE foo.order != ‘blah’ AND foo.id = ‘XYZ' > GROUP BY foo.id; XYZ 79 XYZ 74 XYZ 297 XYZ 66 hive> set hive.auto.convert.join = true; hive> set hive.optimize.reducededuplication = true; hive> SELECT foo.id, count(*) as factor > FROM foo > JOIN bar ON (foo.id = bar.id and foo.line_id = bar.line_id) > JOIN split ON (foo.id = split.id and foo.line_id = split.line_id) > JOIN forecast ON (foo.id = forecast.id AND foo.line_id = forecast.line_id) > WHERE foo.order != ‘blah’ AND foo.id = ‘XYZ' > GROUP BY foo.id; XYZ 516 -- This message was sent by Atlassian JIRA (v6.3.4#6332)