I am on 0.9. If I have a selectivity condition on small table, does Hive try to estimate filtered data size before deciding the join algorithm? If it is the case, it makes sense to use map join even when the small table(before filter) is larger than the hive.mapjoin.smalltable.filesize parameter. Any ideas?
~Mayuresh On Fri, Feb 15, 2013 at 4:05 PM, Aniket Mokashi <aniket...@gmail.com> wrote: > I have tested that the parameter hive.mapjoin.smalltable.filesize works > well with 0.8. What version of hive are you on? > > > On Fri, Feb 15, 2013 at 8:57 AM, <bejoy...@yahoo.com> wrote: > >> ** >> Hi >> >> In later versions of hive you actually don't need a map joint hint in >> your query. Just the following would suffice the purpose >> >> Set hive.auto.convert.join=true >> Regards >> Bejoy KS >> >> Sent from remote device, Please excuse typos >> ------------------------------ >> *From: * Mayuresh Kunjir <mayuresh.kun...@gmail.com> >> *Date: *Fri, 15 Feb 2013 10:37:52 -0500 >> *To: *user<user@hive.apache.org> >> *ReplyTo: * user@hive.apache.org >> *Subject: *Re: Map join optimization issue >> >> Thanks Aniket. I actually had not specified the map-join hint though. >> Sorry for providing the wrong information earlier. I had only >> set hive.auto.convert.join=true before firing my join query. >> >> ~Mayuresh >> >> >> >> On Thu, Feb 14, 2013 at 10:44 PM, Aniket Mokashi <aniket...@gmail.com>wrote: >> >>> I think hive.mapjoin.smalltable.filesize parameter will be disregarded >>> in that case. >>> >>> >>> On Thu, Feb 14, 2013 at 7:25 AM, Mayuresh Kunjir < >>> mayuresh.kun...@gmail.com> wrote: >>> >>>> Yes, the hint was specified. >>>> On Feb 14, 2013 3:11 AM, "Aniket Mokashi" <aniket...@gmail.com> wrote: >>>> >>>>> have you specified map-join hint in your query? >>>>> >>>>> >>>>> On Thu, Feb 7, 2013 at 11:39 AM, Mayuresh Kunjir < >>>>> mayuresh.kun...@gmail.com> wrote: >>>>> >>>>>> >>>>>> Hello all, >>>>>> >>>>>> >>>>>> I am trying to join two tables, the smaller being of size 4GB. When I >>>>>> set hive.mapjoin.smalltable.filesize parameter above 500MB, Hive tries to >>>>>> perform a local task to read the smaller file. This of-course fails since >>>>>> the file size is greater and the backup common join is then run. What I >>>>>> do >>>>>> not understand is why did Hive attempt a map join when small file size >>>>>> was >>>>>> greater than the smalltable.filesize parameter. >>>>>> >>>>>> >>>>>> ~Mayuresh >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> "...:::Aniket:::... Quetzalco@tl" >>>>> >>>> >>> >>> >>> -- >>> "...:::Aniket:::... Quetzalco@tl" >>> >> >> > > > -- > "...:::Aniket:::... Quetzalco@tl" >