Thanks for following up Yin.
We realized later this was due to the reduce deduplication optimization,
and found turning off the flag avoids the issue.
-pala
On Mon, Aug 26, 2013 at 4:40 AM, Yin Huai wrote:
> forgot to add in my last reply To generate correct results, you can
> set hive.op
forgot to add in my last reply To generate correct results, you can
set hive.optimize.reducededuplication to false to turn off
ReduceSinkDeDuplication
On Sun, Aug 25, 2013 at 9:35 PM, Yin Huai wrote:
> Created a jira https://issues.apache.org/jira/browse/HIVE-5149
>
>
> On Sun, Aug 25, 2013
Created a jira https://issues.apache.org/jira/browse/HIVE-5149
On Sun, Aug 25, 2013 at 9:11 PM, Yin Huai wrote:
> Seems ReduceSinkDeDuplication picked the wrong partitioning columns.
>
>
> On Fri, Aug 23, 2013 at 9:15 PM, Shahansad KP wrote:
>
>> I think the problem lies with in the group by o
Seems ReduceSinkDeDuplication picked the wrong partitioning columns.
On Fri, Aug 23, 2013 at 9:15 PM, Shahansad KP wrote:
> I think the problem lies with in the group by operation. For this
> optimization to work the group bys partitioning should be on the column 1
> only.
>
> It wont effect th
I think the problem lies with in the group by operation. For this
optimization to work the group bys partitioning should be on the column 1
only.
It wont effect the correctness of group by, can make it slow but int this
case will fasten the overall query performance.
On Fri, Aug 23, 2013 at 5:55
I have attached the hive 10 and 11 query plans, for the sample query below,
for illustration.
On Fri, Aug 23, 2013 at 5:35 PM, Pala M Muthaia wrote:
> Hi,
>
> We are using DISTRIBUTE BY with custom reducer scripts in our query
> workload.
>
> After upgrade to Hive 0.11, queries with GROUP BY/DI