Corrected a few typos in previous mail Hi Avrila Hi Avrila AFAIK the bucketed map join is not default in hive and it happens only when the configuration parameter hive.optimize.bucketmapjoin is set to true. You may be getting the same execution plan because hive.optimize.bucketmapjoin is set to true in the hive configuration xml file. To cross confirm the same could you explicitly set this to false (set hive.optimize.bucketmapjoin = false; ) in your hive session and get the query execution plan from explain command. Please find some pointers in line 1. Should I see sth different in the explain extended output if I set and unset the hive.optimize.bucketmapjoin option? [Bejoy]Yes, you should be seeing different plans for both. Try EXPLAIN your join query after setting this set hive.optimize.bucketmapjoin = false;
2. Should I see something different in the output of hive while running the query if again I set and unset the hive.optimize.bucketmapjoin? [Bejoy] No,Hive output should be the same. What ever is the execution plan for an join, optimally the end result should be same. 3. Is it possible that even though I set bucketmapjoin to true, Hive will still perform a normal map-side join for some reason? How can I check if this has actually happened? [Bejoy] Hive would perform a plain map side join only if the following parameter is enabled. (default it is disabled) set hive.auto.convert.join = true; you need to check this value in your configurations. If it is enabled irrespective of the table size hive would always try a map join, it would come to a normal join only after the map join attempt fails. AFAIK, if the number of buckets are same or multiples between the two tables involved in a join and if the join is on the same columns that are bucketed, with bucketmapjoin enabled it shouldn't execute a plain mapside join but a bucketed map side join would be triggered. Hope it helps!.. Regards Bejoy K S -----Original Message----- From: Bejoy Ks <bejoy...@yahoo.com> Date: Thu, 19 Jan 2012 09:22:08 To: user@hive.apache.org<user@hive.apache.org> Reply-To: user@hive.apache.org Subject: Re: Question on bucketed map join Hi Avrila AFAIK the bucketed map join is not default in hive and it happens only when the values is set to true. It could be because the same value is already set in the hive configuration xml file. To cross confirm the same could you explicitly set this to false (set hive.optimize.bucketmapjoin = false;)and get the query execution plan from explain command. Please some pointers in line 1. Should I see sth different in the explain extended output if I set and unset the hive.optimize.bucketmapjoin option? [Bejoy] you should be seeing the same Try EXPLAIN your join query after setting this set hive.optimize.bucketmapjoin = false; 2. Should I see something different in the output of hive while running the query if again I set and unset the hive.optimize.bucketmapjoin? [Bejoy] No,Hive output should be the same. What ever is the execution plan for an join, optimally the end result should be same. 3. Is it possible that even though I set bucketmapjoin to true, Hive will still perform a normal map-side join for some reason? How can I check if this has actually happened? [Bejoy] Hive would perform a plain map side join only if the following parameter is enabled. (default it is disabled) set hive.auto.convert.join = true; you need to check this value in your configurations. If it is enabled irrespective of the table size hive would always try a map join, it would come to a normal join only after the map join attempt fails. AFAIK, if the number of buckets are same or multiples between the two tables involved in a join and if the join is on the same columns that are bucketed, with bucketmapjoin enabled it shouldn't execute a plain mapside join a bucketed map side join would be triggered. Hope it helps!.. Regards Bejoy.K.S ________________________________ From: Avrilia Floratou <flora...@cs.wisc.edu> To: user@hive.apache.org Sent: Thursday, January 19, 2012 9:23 PM Subject: Question on bucketed map join Hi, I have two tables with 8 buckets each on the same key and want to join them. I ran "explain extended" and get the plan produced by HIVE which shows that a map-side join is a possible plan. I then set in my script the hive.optimize.bucketmapjoin option to true and reran the "explain extended" query. I get the exact same plans as output. I ran the query with and without the bucketmapjoin optimization and saw no difference in the running time. I have the following questions: 1. Should I see sth different in the explain extended output if I set and unset the hive.optimize.bucketmapjoin option? 2. Should I see something different in the output of hive while running the query if again I set and unset the hive.optimize.bucketmapjoin? 3. Is it possible that even though I set bucketmapjoin to true, Hive will still perform a normal map-side join for some reason? How can I check if this has actually happened? Thanks, Avrilia