[jira] [Updated] (HIVE-9347) Bug with max() together with rank() and grouping sets

Michal Krawczyk (JIRA) Mon, 12 Jan 2015 13:41:17 -0800

     [ 
https://issues.apache.org/jira/browse/HIVE-9347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Michal Krawczyk updated HIVE-9347:
----------------------------------
    Description: 
It looks like the query below returns incorrect results on Hive 0.13.1, but it 
was working fine on Hive 0.11. 

I have the following table:
CREATE  TABLE `t`(
  `category` int, 
  `live` int, 
  `comments` int)

with the following data:
hive> select * from t;
OK
3       0       2
2       0       2
8       0       2

The query:
hive> select category, max(live) live, max(comments) comments, rank() OVER 
(PARTITION BY category ORDER BY comments) rank1
FROM t
GROUP BY category
GROUPING SETS ((), (category))
HAVING max(comments) > 0;

return the following results:

NULL    1       48      1
2       1       49      1
3       1       49      1
8       1       49      1

When using grouping sets with the rank() function the max() function return 
incorrect results. Everything works fine if I remove grouping sets clause and 
split the query into two independent queries or remove the rank() function.

This looks like a bug to me but please review. That said, I'm not sure if it's 
just Amazon issue or general Hive issue.

  was:
It looks like the query below returns incorrect results on Hive 13.1. 

I have the following table:
CREATE  TABLE `t`(
  `category` int, 
  `live` int, 
  `comments` int)

with the following data:
hive> select * from t;
OK
3       0       2
2       0       2
8       0       2

The query:
hive> select category, max(live) live, max(comments) comments, rank() OVER 
(PARTITION BY category ORDER BY comments) rank1
FROM t
GROUP BY category
GROUPING SETS ((), (category))
HAVING max(comments) > 0;

return the following results:

NULL    1       48      1
2       1       49      1
3       1       49      1
8       1       49      1

When using grouping sets with the rank() function the max() function return 
incorrect results. Everything works fine if I remove grouping sets clause and 
split the query into two independent queries or remove the rank() function.

This looks like a bug to me but please review. That said, I'm not sure if it's 
just Amazon issue or general Hive issue.


> Bug with max() together with rank() and grouping sets
> -----------------------------------------------------
>
>                 Key: HIVE-9347
>                 URL: https://issues.apache.org/jira/browse/HIVE-9347
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.13.1
>         Environment: Amazon Elastic Map Reduce, AMI 3.3.1, Hadoop Amazon 
> 2.4.0, Hive 0.13.1
>            Reporter: Michal Krawczyk
>
> It looks like the query below returns incorrect results on Hive 0.13.1, but 
> it was working fine on Hive 0.11. 
> I have the following table:
> CREATE  TABLE `t`(
>   `category` int, 
>   `live` int, 
>   `comments` int)
> with the following data:
> hive> select * from t;
> OK
> 3       0       2
> 2       0       2
> 8       0       2
> The query:
> hive> select category, max(live) live, max(comments) comments, rank() OVER 
> (PARTITION BY category ORDER BY comments) rank1
> FROM t
> GROUP BY category
> GROUPING SETS ((), (category))
> HAVING max(comments) > 0;
> return the following results:
> NULL    1       48      1
> 2       1       49      1
> 3       1       49      1
> 8       1       49      1
> When using grouping sets with the rank() function the max() function return 
> incorrect results. Everything works fine if I remove grouping sets clause and 
> split the query into two independent queries or remove the rank() function.
> This looks like a bug to me but please review. That said, I'm not sure if 
> it's just Amazon issue or general Hive issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9347) Bug with max() together with rank() and grouping sets

Reply via email to