Thanks Jesus, sorry for delayed response. Looking forward to the fix ;). On Wed, Oct 21, 2015 at 6:03 PM, Jesus Camacho Rodriguez < [email protected]> wrote:
> I created HIVE-12223 to track this issue. > > Thanks, > Jesús > > > From: Jesus Camachorodriguez > Reply-To: "[email protected]" > Date: Friday, October 16, 2015 at 8:00 AM > > To: "[email protected]" > Subject: Re: How to use grouping__id in a query > > Hi Michal, > > Sorry I didn't catch your message before. The change of behavior might be > due to a bug; certainly we should filter or at least produce a proper error. > > Could you file a JIRA case and assign it to me? I'll check further. > > Thanks, > Jesús > > > > From: Michal Krawczyk > Reply-To: "[email protected]" > Date: Friday, October 16, 2015 at 8:15 AM > To: "[email protected]" > Subject: Re: How to use grouping__id in a query > > Hi all, > > Unfortunately I didn't get any answer on this one, perhaps I asked the > question incorrectly. I'll try another one then ;). > > Should it be possible to use grouping__id function in having clause to > filter our null values in the same query. It used to work in Hive 0.11 and > 0.13, but doesn't work in Hive 1.0. > > Thanks, > Michal > > On Fri, Sep 25, 2015 at 1:14 PM, Michal Krawczyk <[email protected]> > wrote: > >> Hi all, >> >> During the migration from Hive 0.11 to 1.0 on Amazon EMR I run to an >> issue with grouping__id function. I'd like to use it to filter out NULL >> values that didn't come from grouping sets. Here's an example: >> >> We have a simple table with some data: >> >> hive> create table grouping_test (col1 string, col2 string); >> hive> insert into grouping_test values (1, 2), (1, 3), (1, null), (null, >> 2); >> hive> select * from grouping_test; >> OK >> 1 2 >> 1 3 >> 1 NULL >> NULL 2 >> >> hive> select col1, col2, GROUPING__ID, count(*) >> from grouping_test >> group by col1, col2 >> grouping sets ((), (col1)) >> having !(col1 IS NULL AND ((CAST(GROUPING__ID as int) & 1) > 0)) >> >> I expect the query above to filter out NULL col1 for the col1 grouping >> set, it used to work on Hive 0.11. But on Hive 1.0 it doesn't filter any >> values and still returns NULL col1: >> >> NULL NULL 0 4 >> NULL NULL 1 1 <=== this row is expected to be removed >> by the having clause >> 1 NULL 1 3 >> >> I tried also a few other conditions on grouping__id in having clause and >> none of them seem to work correctly: >> >> select col1, col2, GROUPING__ID, count(*) >> from grouping_test >> group by col1, col2 >> grouping sets ((), (col1)) >> having GROUPING__ID = '1' >> >> This query doesn't return any data. >> >> >> I also tried to embed it into a subquery, but still no luck. It finally >> worked when I saved the output of the main query to a temp table and >> filtered out the data using where clause, but this looks like an overkill. >> >> So my question is: How to filter out values using grouping__id in Hive >> 1.0? >> >> Thanks for your help, >> Michal >> >> >> -- >> Michal Krawczyk >> Project Manager / Tech Lead >> Union Square Internet Development >> http://www.u2i.com/ >> > > > > -- > Michal Krawczyk > Project Manager / Tech Lead > Union Square Internet Development > http://www.u2i.com/ > -- Michal Krawczyk Project Manager / Tech Lead Union Square Internet Development http://www.u2i.com/
