anyone any idea? this seems like very strange behavior to me. and it blows
up the job.

On Fri, Jul 22, 2011 at 5:51 PM, Koert Kuipers <ko...@tresata.com> wrote:

> hello,
> we have 2 tables x and y. table x is 11GB on disk and has 23M rows. table y
> is 3GB on disk and has 28M rows. Both tables are stored as LZO compressed
> sequencefiles without bucketing.
>
> a normal join of x an y gets executed as a map-reduce-join in hive and
> works very well. an outer join also gets executed as a map-reduce-join and
> again works well.
> but a left outer join gets converted in a map-join which results in a
> OutOfMemoryError (GC overhead limit exceeded).
>
> the mapjoin related parameters in my hive-settings.xml are:
> hive.auto.convert.join=true
> hive.mapjoin.maxsize=100000
> hive.mapjoin.smalltable.filesize=25000000
>
> why does the left outer join get converted into map-join? it seems like my
> table sizes are way beyond where a map-join should be attempted, no?
>
>

Reply via email to