FYI, I created a bug report and it's been already fixed on master: https://issues.apache.org/jira/browse/HIVE-9347. Thanks for your help.
On Wed, Oct 22, 2014 at 9:39 AM, Michal Krawczyk <michal.krawc...@u2i.com> wrote: > Not sure. The issue you mentioned requires specifying additional columns, > whereas the one I mentioned return obviously incorrect results, which seems > to be much more severe issue. > > Can anybody try to replicate this? If it's really the case on non Amazon > Hive I'll send a bug report on Jira. > > On Tue, Oct 21, 2014 at 4:01 PM, j.barrett Strausser < > j.barrett.straus...@gmail.com> wrote: > >> Perhaps related to https://issues.apache.org/jira/browse/HIVE-4663 >> >> I ran across similar issues in .11 not sure if that above ticket affects >> .13 >> >> On Tue, Oct 21, 2014 at 8:21 AM, Michal Krawczyk <michal.krawc...@u2i.com >> > wrote: >> >>> Hi all, >>> >>> Recently I've run into a problem with incorrect results in one of the >>> queries on our system after upgrade from Hive 0.8.1.4 to 0.13.1. We use >>> Amazon Elastic Map Reduce servivce on Amazon. I tried to simplify the >>> original query and replicate this issue on a small dataset. Please take a >>> look at the queries below and let me know what are your thoughts. >>> >>> I have the following table: >>> CREATE TABLE `t`( >>> `category` int, >>> `live` int, >>> `comments` int) >>> >>> with the following data: >>> hive> select * from t; >>> OK >>> 3 0 2 >>> 2 0 2 >>> 8 0 2 >>> >>> The query: >>> hive> select category, max(live) live, max(comments) comments, rank() >>> OVER (PARTITION BY category ORDER BY comments) rank1 >>> FROM t >>> GROUP BY category >>> GROUPING SETS ((), (category)) >>> HAVING max(comments) > 0; >>> >>> return the following results: >>> >>> NULL 1 48 1 >>> 2 1 49 1 >>> 3 1 49 1 >>> 8 1 49 1 >>> >>> Long story short when using grouping sets with the rank() function the >>> max() function return incorrect results. Everything works fine if I remove >>> grouping sets clause and split the query into two independent queries or >>> remove the rank() function. >>> >>> This looks like a bug to me but please review. That said, I'm not sure >>> if it's just Amazon issue or general Hive issue. >>> >>> Thanks, >>> Michal >>> >>> -- >>> Michal Krawczyk >>> Project Manager / Tech Lead >>> Union Square Internet Development >>> http://www.u2i.com/ >>> >> >> >> >> -- >> >> >> https://github.com/bearrito >> @deepbearrito >> > > > > -- > Michal Krawczyk > Project Manager / Tech Lead > Union Square Internet Development > http://www.u2i.com/ > -- Michal Krawczyk Project Manager / Tech Lead Union Square Internet Development http://www.u2i.com/