subject:"Re\: Duplicate rows when using group by in subquery"

Re: Duplicate rows when using group by in subquery

2013-09-19 Thread Yin Huai

e, Sep 17, 2013 at 2:24 AM, Mikael Öhman wrote: > > Thank you for the information. Just to be clear, it is not that I have > manually restricted the job to run using only a single mapreduce job, but > it incorrectly assumes one job is enough? > > I will get back with results fr

Re: Duplicate rows when using group by in subquery

2013-09-17 Thread Yin Huai

måndag, 16 september 2013 19:52 > *Ämne:* Re: Duplicate rows when using group by in subquery > > Hello Mikael, > > Seems your case is related to the bug reported in > https://issues.apache.org/jira/browse/HIVE-5149. Basically, when hive > uses a single MapReduce job to evalu

Re: Duplicate rows when using group by in subquery

2013-09-16 Thread Yin Huai

Hello Mikael, Seems your case is related to the bug reported in https://issues.apache.org/jira/browse/HIVE-5149. Basically, when hive uses a single MapReduce job to evaluate your query, "c.Symbol" and "c.catid" are used to partitioning data, and thus, rows with the same value of "c.Symbol" are not