Re: Duplicate rows when using group by in subquery

2013-09-19 Thread Yin Huai
e, Sep 17, 2013 at 2:24 AM, Mikael Öhman wrote: > > Thank you for the information. Just to be clear, it is not that I have > manually restricted the job to run using only a single mapreduce job, but > it incorrectly assumes one job is enough? > > I will get back with results fr

SV: Duplicate rows when using group by in subquery

2013-09-19 Thread Mikael Öhman
Just built from source this morning so seems strange that the bug would still persist :(. Från: Yin Huai Till: user@hive.apache.org; Mikael Öhman Skickat: tisdag, 17 september 2013 15:30 Ämne: Re: Duplicate rows when using group by in sub

Re: Duplicate rows when using group by in subquery

2013-09-17 Thread Yin Huai
måndag, 16 september 2013 19:52 > *Ämne:* Re: Duplicate rows when using group by in subquery > > Hello Mikael, > > Seems your case is related to the bug reported in > https://issues.apache.org/jira/browse/HIVE-5149. Basically, when hive > uses a single MapReduce job to evalu

SV: Duplicate rows when using group by in subquery

2013-09-16 Thread Mikael Öhman
lable until Thursday.   / Sincerely Mikael Från: Yin Huai Till: user@hive.apache.org; Mikael Öhman Skickat: måndag, 16 september 2013 19:52 Ämne: Re: Duplicate rows when using group by in subquery Hello Mikael, Seems your case is related to the bug rep

Re: Duplicate rows when using group by in subquery

2013-09-16 Thread Yin Huai
Hello Mikael, Seems your case is related to the bug reported in https://issues.apache.org/jira/browse/HIVE-5149. Basically, when hive uses a single MapReduce job to evaluate your query, "c.Symbol" and "c.catid" are used to partitioning data, and thus, rows with the same value of "c.Symbol" are not

Duplicate rows when using group by in subquery

2013-09-16 Thread Mikael Öhman
Hello. This is basically the same question I posted on stackoverflow: http://stackoverflow.com/questions/18812390/hive-subquery-and-group-by/18818115?noredirect=1#18818115 I know the query is a bit noisy. But this query also demonstrates the error: select a.symbol from (select symbol, ordertype