Re: Question on bucketed map join

bejoy_ks Thu, 19 Jan 2012 09:35:17 -0800

Corrected a few typos in previous mail

Hi Avrila
Hi Avrila
       AFAIK the bucketed map join is not default in hive and it happens only 
when the configuration parameter hive.optimize.bucketmapjoin  is set to true. 
You may be getting the same execution plan because hive.optimize.bucketmapjoin  
is set to true  in the hive configuration xml file. To cross confirm the same 
could you explicitly set this to false
(set hive.optimize.bucketmapjoin = false;
) in your hive session and get the query execution plan from explain command. 
Please find some pointers in line
1. Should I see sth different in the explain extended output if I set and unset 
the hive.optimize.bucketmapjoin option?
[Bejoy]Yes, you should be seeing different plans for both.
Try EXPLAIN your join query after setting this
set hive.optimize.bucketmapjoin = false;


2. Should I see something different in the output of hive while running the 
query if again I set and unset the hive.optimize.bucketmapjoin?
[Bejoy] No,Hive output should be the same. What ever is the execution plan for 
an join, optimally the end result should be same.

3. Is it possible that even though I set bucketmapjoin to true, Hive will still 
perform a normal map-side join for some reason? How can I check if this has 
actually happened?
[Bejoy] Hive would perform a plain map side join only if the following 
parameter is enabled. (default it is disabled)
set hive.auto.convert.join = true; you need to check this value in your 
configurations.
If it is enabled irrespective of the table size hive would always try a map 
join, it would come to a normal join only after the map join attempt fails.
AFAIK, if the number of buckets are same or multiples between the two tables 
involved in a join and if the join is on the same columns that are bucketed, 
with bucketmapjoin enabled it shouldn't execute a plain mapside join but a 
bucketed map side join would be triggered.

Hope it helps!..


Regards
Bejoy K S

-----Original Message-----
From: Bejoy Ks <bejoy...@yahoo.com>
Date: Thu, 19 Jan 2012 09:22:08 
To: user@hive.apache.org<user@hive.apache.org>
Reply-To: user@hive.apache.org
Subject: Re: Question on bucketed map join

Hi Avrila
       AFAIK the bucketed map join is not default in hive and it happens only 
when the values is set to true. It could be because the same value is already 
set in the hive configuration xml file. To cross confirm the same could you 
explicitly set this to false 

(set hive.optimize.bucketmapjoin = false;)and get the query execution plan from 
explain command. 


Please some pointers in line

1. Should I see sth different in the explain extended output if I set and unset 
the hive.optimize.bucketmapjoin option?
[Bejoy] you should be seeing the same
Try EXPLAIN your join query after setting this
set hive.optimize.bucketmapjoin = false;


2. Should I see something different in the output of hive while running 
the query if again I set and unset the hive.optimize.bucketmapjoin?
[Bejoy] No,Hive output should be the same. What ever is the execution plan for 
an join, optimally the end result should be same. 


3.
 Is it possible that even though I set bucketmapjoin to true, Hive will 
still perform a normal map-side join for some reason? How can I check if
 this has actually happened?
[Bejoy] Hive would perform a plain map side join only if the following 
parameter is enabled. (default it is disabled)

set hive.auto.convert.join = true; you need to check this value in your 
configurations.
If it is enabled irrespective of the table size hive would always try a map 
join, it would come to a normal join only after the map join attempt fails.
AFAIK, if the number of buckets are same or multiples between the two tables 
involved in a join and if the join is on the same columns that are bucketed, 
with bucketmapjoin enabled it shouldn't execute a plain mapside join a bucketed 
map side join would be triggered.

Hope it helps!..

Regards
Bejoy.K.S



________________________________
 From: Avrilia Floratou <flora...@cs.wisc.edu>
To: user@hive.apache.org 
Sent: Thursday, January 19, 2012 9:23 PM
Subject: Question on bucketed map join
 
Hi,

I have two tables with 8 buckets each on the same key and want to join them.
I ran "explain extended" and get the plan produced by HIVE which shows that a 
map-side join is a possible plan.

I then set in my script the hive.optimize.bucketmapjoin option to true and 
reran the "explain extended" query. I get the exact same plans as output.

I ran the query with and without the bucketmapjoin optimization and saw no 
difference in the running time.

I have the following questions:

1. Should I see sth different in the explain extended output if I set and unset 
the hive.optimize.bucketmapjoin option?

2. Should I see something different in the output of hive while running the 
query if again I set and unset the hive.optimize.bucketmapjoin?

3. Is it possible that even though I set bucketmapjoin to true, Hive will still 
perform a normal map-side join for some reason? How can I check if this has 
actually happened?

Thanks,
Avrilia

Re: Question on bucketed map join

Reply via email to