thanks. changing mapred.child.java.opts from -Xmx512m to -Xmx1024m did the trick
allocating more memory to the On Tue, Jul 19, 2011 at 6:49 PM, yongqiang he <heyongqiang...@gmail.com>wrote: > >> i thought only one table needed to be small? > Yes. > > >> hive.mapjoin.maxsize also apply to big table? > No. > > >> i made sure hive.mapjoin.smalltable.filesize and hive.mapjoin.maxsize > are set large enough to accomodate the small table. yet hive does not > attempt to do a mapjoin. > > There are physical limitations. If the local machine can not hold all > records in memory locally, the local hashmap has to fail. So check > your machine's memory or the memory allocated for hive. > > Thanks > Yongqiang > On Tue, Jul 19, 2011 at 1:55 PM, Koert Kuipers <ko...@tresata.com> wrote: > > thanks! > > i only see hive create the hashmap dump and perform mapjoin if both > tables > > are small. i thought only one table needed to be small? > > > > i try to merge a very large table with a small table. i made sure > > hive.mapjoin.smalltable.filesize and hive.mapjoin.maxsize are set large > > enough to accomodate the small table. yet hive does not attempt to do a > > mapjoin. does hive.mapjoin.maxsize also apply to big table? or do i need > to > > look at other parameters as well? > > > > On Tue, Jul 19, 2011 at 4:15 PM, yongqiang he <heyongqiang...@gmail.com> > > wrote: > >> > >> in most cases, the mapjoin falls back to normal join because of one of > >> these three reasons: > >> 1) the input table size is very big, so there will be no try on mapjoin > >> 2) if one of the input table is small (let's say less than 25MB which > >> is configurable), hive will try a local hashmap dump. If it cause OOM > >> on the client side when doing the local hashmap dump, it will go back > >> normal join.The reason here is mostly due to very good compression on > >> the input data. > >> 3) the mapjoin actually got started, and fails. it will fall back > >> normal join. This will most unlikely happen > >> > >> Thanks > >> Yongqiang > >> On Tue, Jul 19, 2011 at 11:16 AM, Koert Kuipers <ko...@tresata.com> > wrote: > >> > note: this is somewhat a repost of something i posted on the CDH3 user > >> > group. apologies if that is not appropriate. > >> > > >> > i am exploring map-joins in hive. with hive.auto.convert.join=true > hive > >> > tries to do a map-join and then falls back on a mapreduce-join if > >> > certain > >> > conditions are not met. this sounds great. but when i do a > >> > query and i notice it falls back on a mapreduce-join, how can i see > >> > which > >> > condition triggered the fallback (smalltablle.filesize or > >> > mapjoin.maxsize or > >> > something else perhaps memory related)? > >> > > >> > i tried reading the default log that a hive session produces, but it > >> > seems > >> > more like a massive json file than a log to me, so it is very hard for > >> > me to > >> > interpret that. i also turned on logging to console with debugging, > >> > looking > >> > for any clues there but without luck so far. is the info there and am > i > >> > just > >> > overlooking it? any ideas? > >> > > >> > thanks! koert > >> > > >> > > >> > > > > > >