anyone any idea? this seems like very strange behavior to me. and it blows up the job.
On Fri, Jul 22, 2011 at 5:51 PM, Koert Kuipers <ko...@tresata.com> wrote: > hello, > we have 2 tables x and y. table x is 11GB on disk and has 23M rows. table y > is 3GB on disk and has 28M rows. Both tables are stored as LZO compressed > sequencefiles without bucketing. > > a normal join of x an y gets executed as a map-reduce-join in hive and > works very well. an outer join also gets executed as a map-reduce-join and > again works well. > but a left outer join gets converted in a map-join which results in a > OutOfMemoryError (GC overhead limit exceeded). > > the mapjoin related parameters in my hive-settings.xml are: > hive.auto.convert.join=true > hive.mapjoin.maxsize=100000 > hive.mapjoin.smalltable.filesize=25000000 > > why does the left outer join get converted into map-join? it seems like my > table sizes are way beyond where a map-join should be attempted, no? > >