[jira] [Created] (HIVE-11132) Queries using join and group by produce incorrect output when hive.auto.convert.join=false and hive.optimize.reducededuplication=true

Rich Haase (JIRA) Fri, 26 Jun 2015 15:24:33 -0700

Rich Haase created HIVE-11132:
---------------------------------

             Summary: Queries using join and group by produce incorrect output 
when hive.auto.convert.join=false and hive.optimize.reducededuplication=true
                 Key: HIVE-11132
                 URL: https://issues.apache.org/jira/browse/HIVE-11132
             Project: Hive
          Issue Type: Bug
    Affects Versions: 0.14.0
            Reporter: Rich Haase



Queries using join and group by produce multiple output rows with the same key 
when hive.auto.convert.join=false and hive.optimize.reducededuplication=true.  
This interaction between configuration parameters is unexpected and should be 
well documented at the very least and should likely be considered a bug.

e.g. 
hive> set hive.auto.convert.join = false;
hive> set hive.optimize.reducededuplication = true;
hive> SELECT foo.id, count(*) as factor
    > FROM foo
    > JOIN bar ON (foo.id = bar.id and foo.line_id = bar.line_id)
    > JOIN split ON (foo.id = split.id and foo.line_id = split.line_id)
    > JOIN forecast ON (foo.id = forecast.id AND foo.line_id = forecast.line_id)
    > WHERE foo.order != ‘blah’ AND foo.id = ‘XYZ'
    > GROUP BY foo.id;
XYZ         79
XYZ             74
XYZ             297
XYZ             66

hive> set hive.auto.convert.join = true;
hive> set hive.optimize.reducededuplication = true;
hive> SELECT foo.id, count(*) as factor
    > FROM foo
    > JOIN bar ON (foo.id = bar.id and foo.line_id = bar.line_id)
    > JOIN split ON (foo.id = split.id and foo.line_id = split.line_id)
    > JOIN forecast ON (foo.id = forecast.id AND foo.line_id = forecast.line_id)
    > WHERE foo.order != ‘blah’ AND foo.id = ‘XYZ'
    > GROUP BY foo.id;
XYZ         516






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-11132) Queries using join and group by produce incorrect output when hive.auto.convert.join=false and hive.optimize.reducededuplication=true

Reply via email to