hello, we have 2 tables x and y. table x is 11GB on disk and has 23M rows. table y is 3GB on disk and has 28M rows. Both tables are stored as LZO compressed sequencefiles without bucketing.
a normal join of x an y gets executed as a map-reduce-join in hive and works very well. an outer join also gets executed as a map-reduce-join and again works well. but a left outer join gets converted in a map-join which results in a OutOfMemoryError (GC overhead limit exceeded). the mapjoin related parameters in my hive-settings.xml are: hive.auto.convert.join=true hive.mapjoin.maxsize=100000 hive.mapjoin.smalltable.filesize=25000000 why does the left outer join get converted into map-join? it seems like my table sizes are way beyond where a map-join should be attempted, no?