Re: Possible bug with max() together with rank() and grouping sets

Michal Krawczyk Mon, 19 Jan 2015 23:59:27 -0800

FYI, I created a bug report and it's been already fixed on master:
https://issues.apache.org/jira/browse/HIVE-9347. Thanks for your help.


On Wed, Oct 22, 2014 at 9:39 AM, Michal Krawczyk <michal.krawc...@u2i.com>
wrote:

> Not sure. The issue you mentioned requires specifying additional columns,
> whereas the one I mentioned return obviously incorrect results, which seems
> to be much more severe issue.
>
> Can anybody try to replicate this? If it's really the case on non Amazon
> Hive I'll send a bug report on Jira.
>
> On Tue, Oct 21, 2014 at 4:01 PM, j.barrett Strausser <
> j.barrett.straus...@gmail.com> wrote:
>
>> Perhaps related to https://issues.apache.org/jira/browse/HIVE-4663
>>
>> I ran across similar issues in .11 not sure if that above ticket affects
>> .13
>>
>> On Tue, Oct 21, 2014 at 8:21 AM, Michal Krawczyk <michal.krawc...@u2i.com
>> > wrote:
>>
>>> Hi all,
>>>
>>> Recently I've run into a problem with incorrect results in one of the
>>> queries on our system after upgrade from Hive 0.8.1.4 to 0.13.1. We use
>>> Amazon Elastic Map Reduce servivce on Amazon. I tried to simplify the
>>> original query and replicate this issue on a small dataset. Please take a
>>> look at the queries below and let me know what are your thoughts.
>>>
>>> I have the following table:
>>> CREATE  TABLE `t`(
>>>   `category` int,
>>>   `live` int,
>>>   `comments` int)
>>>
>>> with the following data:
>>> hive> select * from t;
>>> OK
>>> 3       0       2
>>> 2       0       2
>>> 8       0       2
>>>
>>> The query:
>>> hive> select category, max(live) live, max(comments) comments, rank()
>>> OVER (PARTITION BY category ORDER BY comments) rank1
>>> FROM t
>>> GROUP BY category
>>> GROUPING SETS ((), (category))
>>> HAVING max(comments) > 0;
>>>
>>> return the following results:
>>>
>>> NULL    1       48      1
>>> 2       1       49      1
>>> 3       1       49      1
>>> 8       1       49      1
>>>
>>> Long story short when using grouping sets with the rank() function the
>>> max() function return incorrect results. Everything works fine if I remove
>>> grouping sets clause and split the query into two independent queries or
>>> remove the rank() function.
>>>
>>> This looks like a bug to me but please review. That said, I'm not sure
>>> if it's just Amazon issue or general Hive issue.
>>>
>>> Thanks,
>>> Michal
>>>
>>> --
>>> Michal Krawczyk
>>> Project Manager / Tech Lead
>>> Union Square Internet Development
>>> http://www.u2i.com/
>>>
>>
>>
>>
>> --
>>
>>
>> https://github.com/bearrito
>> @deepbearrito
>>
>
>
>
> --
> Michal Krawczyk
> Project Manager / Tech Lead
> Union Square Internet Development
> http://www.u2i.com/
>



-- 
Michal Krawczyk
Project Manager / Tech Lead
Union Square Internet Development
http://www.u2i.com/

Re: Possible bug with max() together with rank() and grouping sets

Reply via email to