Could you post hive version and execution plan for the query?

2013/12/21 Martin, Nick <nimar...@pssd.com>

>  Hi all,
>
>
>
> I have two tables:
>
>
>
> tbl1: 81m rows
>
> tbl2: 4m rows
>
>
>
> tbl1 is partitioned on one column and tbl2 has none.
>
>
>
>
> I’m attempting the following query:
>
>
>
> SELECT
>
> tbl1.col_pk,
>
> tbl2.col1,
>
> tbl2.col2,
>
> SUM(tbl1.col4),
>
> SUM(tbl1.col5),
>
> SUM(tbl1.col4+col5)
>
> FROM tbl2
>
> JOIN tbl1 ON (tbl1.col_pk=tbl2.col_pk)
>
> WHERE tbl1.partitioned_col IN ('2011','2012','2013')
>
> GROUP BY
>
> tbl1.col_pk,
>
> tbl2.col1,
>
> tbl2.col2;
>
>
>
> I get this error:
>
>
>
> OutOfMemoryError: GC overhead limit exceeded
>
>
>
> So, I followed the suggestion at the end of the error output (Currently
> hive.map.aggr.hash.percentmemory is set to 0.5. Try setting it to a lower
> value. i.e 'set hive.map.aggr.hash.percentmemory = 0.25;') through several
> iterations, eventually getting my hive.map.aggr.hash.percentmemory setting
> down to something like .0165 and it still failed.
>
>
>
> I did some searching and found some convoluted recommendations of what to
> try next. Some mentioned upping my heap size, some mentioned re-writing my
> query, etc. I upped my Hadoop maximum Java heap size to 4096mb ,re-ran, and
> got the same results.
>
>
>
> Currently, some relevant settings are:
>
>
>
> NameNode Heap Size: 4096mb
>
> DataNode maximum Java heap size: 4096mb
>
> Hadoop maximum Java heap size: 4096mb
>
> Java Options for MapReduce tasks: 768mb
>
>
>
> I have 16 map slots and 8 reduce slots available (5 node cluster, 4 data
> and one name)
>
>
>
> Thanks in advance for the help,
>
> Nick
>

Reply via email to