Could you post hive version and execution plan for the query?
2013/12/21 Martin, Nick <nimar...@pssd.com> > Hi all, > > > > I have two tables: > > > > tbl1: 81m rows > > tbl2: 4m rows > > > > tbl1 is partitioned on one column and tbl2 has none. > > > > > I’m attempting the following query: > > > > SELECT > > tbl1.col_pk, > > tbl2.col1, > > tbl2.col2, > > SUM(tbl1.col4), > > SUM(tbl1.col5), > > SUM(tbl1.col4+col5) > > FROM tbl2 > > JOIN tbl1 ON (tbl1.col_pk=tbl2.col_pk) > > WHERE tbl1.partitioned_col IN ('2011','2012','2013') > > GROUP BY > > tbl1.col_pk, > > tbl2.col1, > > tbl2.col2; > > > > I get this error: > > > > OutOfMemoryError: GC overhead limit exceeded > > > > So, I followed the suggestion at the end of the error output (Currently > hive.map.aggr.hash.percentmemory is set to 0.5. Try setting it to a lower > value. i.e 'set hive.map.aggr.hash.percentmemory = 0.25;') through several > iterations, eventually getting my hive.map.aggr.hash.percentmemory setting > down to something like .0165 and it still failed. > > > > I did some searching and found some convoluted recommendations of what to > try next. Some mentioned upping my heap size, some mentioned re-writing my > query, etc. I upped my Hadoop maximum Java heap size to 4096mb ,re-ran, and > got the same results. > > > > Currently, some relevant settings are: > > > > NameNode Heap Size: 4096mb > > DataNode maximum Java heap size: 4096mb > > Hadoop maximum Java heap size: 4096mb > > Java Options for MapReduce tasks: 768mb > > > > I have 16 map slots and 8 reduce slots available (5 node cluster, 4 data > and one name) > > > > Thanks in advance for the help, > > Nick >