Re: hive mapjoin decision process

2011-07-19 Thread Koert Kuipers
thanks. changing mapred.child.java.opts from -Xmx512m to -Xmx1024m did the trick allocating more memory to the On Tue, Jul 19, 2011 at 6:49 PM, yongqiang he wrote: > >> i thought only one table needed to be small? > Yes. > > >> hive.mapjoin.maxsize also apply to big table? > No. > > >> i made s

Re: hive mapjoin decision process

2011-07-19 Thread yongqiang he
>> i thought only one table needed to be small? Yes. >> hive.mapjoin.maxsize also apply to big table? No. >> i made sure hive.mapjoin.smalltable.filesize and hive.mapjoin.maxsize are >> set large enough to accomodate the small table. yet hive does not attempt to >> do a mapjoin. There are phys

Re: hive mapjoin decision process

2011-07-19 Thread Koert Kuipers
thanks! i only see hive create the hashmap dump and perform mapjoin if both tables are small. i thought only one table needed to be small? i try to merge a very large table with a small table. i made sure hive.mapjoin.smalltable.filesize and hive.mapjoin.maxsize are set large enough to accomodate

Re: hive mapjoin decision process

2011-07-19 Thread yongqiang he
in most cases, the mapjoin falls back to normal join because of one of these three reasons: 1) the input table size is very big, so there will be no try on mapjoin 2) if one of the input table is small (let's say less than 25MB which is configurable), hive will try a local hashmap dump. If it cause

hive mapjoin decision process

2011-07-19 Thread Koert Kuipers
note: this is somewhat a repost of something i posted on the CDH3 user group. apologies if that is not appropriate. i am exploring map-joins in hive. with hive.auto.convert.join=true hive tries to do a map-join and then falls back on a mapreduce-join if certain conditions are not met. this sounds