Re: DISTRIBUTE BY works incorrectly in Hive 0.11 in some cases

2013-08-26 Thread Pala M Muthaia
Thanks for following up Yin. We realized later this was due to the reduce deduplication optimization, and found turning off the flag avoids the issue. -pala On Mon, Aug 26, 2013 at 4:40 AM, Yin Huai wrote: > forgot to add in my last reply To generate correct results, you can > set hive.op

Re: DISTRIBUTE BY works incorrectly in Hive 0.11 in some cases

2013-08-26 Thread Yin Huai
forgot to add in my last reply To generate correct results, you can set hive.optimize.reducededuplication to false to turn off ReduceSinkDeDuplication On Sun, Aug 25, 2013 at 9:35 PM, Yin Huai wrote: > Created a jira https://issues.apache.org/jira/browse/HIVE-5149 > > > On Sun, Aug 25, 2013

Re: DISTRIBUTE BY works incorrectly in Hive 0.11 in some cases

2013-08-25 Thread Yin Huai
Created a jira https://issues.apache.org/jira/browse/HIVE-5149 On Sun, Aug 25, 2013 at 9:11 PM, Yin Huai wrote: > Seems ReduceSinkDeDuplication picked the wrong partitioning columns. > > > On Fri, Aug 23, 2013 at 9:15 PM, Shahansad KP wrote: > >> I think the problem lies with in the group by o

Re: DISTRIBUTE BY works incorrectly in Hive 0.11 in some cases

2013-08-25 Thread Yin Huai
Seems ReduceSinkDeDuplication picked the wrong partitioning columns. On Fri, Aug 23, 2013 at 9:15 PM, Shahansad KP wrote: > I think the problem lies with in the group by operation. For this > optimization to work the group bys partitioning should be on the column 1 > only. > > It wont effect th

Re: DISTRIBUTE BY works incorrectly in Hive 0.11 in some cases

2013-08-23 Thread Shahansad KP
I think the problem lies with in the group by operation. For this optimization to work the group bys partitioning should be on the column 1 only. It wont effect the correctness of group by, can make it slow but int this case will fasten the overall query performance. On Fri, Aug 23, 2013 at 5:55

Re: DISTRIBUTE BY works incorrectly in Hive 0.11 in some cases

2013-08-23 Thread Pala M Muthaia
I have attached the hive 10 and 11 query plans, for the sample query below, for illustration. On Fri, Aug 23, 2013 at 5:35 PM, Pala M Muthaia wrote: > Hi, > > We are using DISTRIBUTE BY with custom reducer scripts in our query > workload. > > After upgrade to Hive 0.11, queries with GROUP BY/DI