Hi Jan,
Glad you found something workable.

What version of Hive are you using? Could you also please check what the value 
of the property hive.optimize.ppd is for you?

Thanks,
Mark

----- Original Message -----
From: "Jan Dolinár" <dolik....@gmail.com>
To: user@hive.apache.org
Sent: Tuesday, May 29, 2012 1:57:25 AM
Subject: Re: Multi-group-by select always scans entire table


On Fri, May 25, 2012 at 12:03 PM, Jan Dolinár < dolik....@gmail.com > wrote: 


-- see what happens when you try to perform multi-group-by query on one of the 
partitions 
EXPLAIN EXTENDED 
FROM partition_test 
LATERAL VIEW explode(col1) tmp AS exp_col1 
INSERT OVERWRITE DIRECTORY '/test/1' 
SELECT exp_col1 
WHERE (part_col=2) 
INSERT OVERWRITE DIRECTORY '/test/2' 
SELECT exp_col1 
WHERE (part_col=2); 
-- result: it wants to scan all partitions :-( 


Since nobody else did, let me answer myself... In the end I found out that the 
correct partition pruning can be achieved using subquery. Continuing the 
example from my last post, the query would be: 


FROM ( 
SELECT * FROM partition_test 
LATERAL VIEW explode(col1) tmp AS exp_col1 
WHERE part_col=2 
) t 
INSERT OVERWRITE DIRECTORY '/test/1' 
SELECT exp_col1 
INSERT OVERWRITE DIRECTORY '/test/2' 
SELECT exp_col1; 


I still think the pruning should work correctly no matter how the query is 
written, but for now I'm happy with this solution. 


J. Dolinar

Reply via email to