hello,
we have 2 tables x and y. table x is 11GB on disk and has 23M rows. table y
is 3GB on disk and has 28M rows. Both tables are stored as LZO compressed
sequencefiles without bucketing.

a normal join of x an y gets executed as a map-reduce-join in hive and works
very well. an outer join also gets executed as a map-reduce-join and again
works well.
but a left outer join gets converted in a map-join which results in a
OutOfMemoryError (GC overhead limit exceeded).

the mapjoin related parameters in my hive-settings.xml are:
hive.auto.convert.join=true
hive.mapjoin.maxsize=100000
hive.mapjoin.smalltable.filesize=25000000

why does the left outer join get converted into map-join? it seems like my
table sizes are way beyond where a map-join should be attempted, no?

Reply via email to