The one setting that's still unknown
is mapred.tasktracker.reduce.tasks.maximum
Have you tried setting that to 1 as an experiment and increasing the java
heap.
I assume the mappers are not eating all the memory.



On Wed, Feb 23, 2011 at 9:57 PM, hadoop n00b <new2h...@gmail.com> wrote:

> I have suddenly began to get this error (hadoop error code 2) even for
> not-so-big queries. I am running a 6 node cluster. I tried to run the
> queries with 6 and 10 reducers but got the same result.
>
> On Wed, Feb 23, 2011 at 8:25 PM, Bennie Schut <bsc...@ebuddy.com> wrote:
>
>> We filter nulls already before the tables are filled but then this will
>> probably cause a skew in the keys like Paul was saying. I'm running some
>> queries on the keys to see if that's the case.
>> I do expect there will be large differences in distribution of some of the
>> keys.
>> I'm looking at "set hive.optimize.skewjoin=true" before the query to see
>> if that helps. Will try that later.
>>
>>
>> On 02/23/2011 05:25 AM, Mapred Learn wrote:
>>
>>> Oops I meant nulls.
>>>
>>> Sent from my iPhone
>>>
>>> On Feb 22, 2011, at 8:22 PM, Mapred Learn<mapred.le...@gmail.com>
>>>  wrote:
>>>
>>> Check if you can filter non-nulls. That might help.
>>>>
>>>> Sent from my iPhone
>>>>
>>>> On Feb 22, 2011, at 12:46 AM, Bennie Schut<bsc...@ebuddy.com>  wrote:
>>>>
>>>> I've just set the "hive.exec.reducers.bytes.per.reducer" to as low as
>>>>> 100k which caused this job to run with 999 reducers. I still have 5 tasks
>>>>> failing with an outofmemory.
>>>>>
>>>>> We have jvm reuse set to 8 but dropping it to 1 seems to greatly reduce
>>>>> this problem:
>>>>> set mapred.job.reuse.jvm.num.tasks = 1;
>>>>>
>>>>> It's still puzzling me how it can run out of memory. It seems like some
>>>>> of the reducers get an unequally large share of the work.
>>>>>
>>>>>
>>>>> On 02/18/2011 10:53 AM, Bennie Schut wrote:
>>>>>
>>>>>> When we try to join two large tables some of the reducers stop with an
>>>>>> OutOfMemory exception.
>>>>>>
>>>>>> Error: java.lang.OutOfMemoryError: Java heap space
>>>>>> at
>>>>>>
>>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1508)
>>>>>>
>>>>>> at
>>>>>>
>>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1408)
>>>>>>
>>>>>> at
>>>>>>
>>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1261)
>>>>>>
>>>>>> at
>>>>>>
>>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1195)
>>>>>>
>>>>>>
>>>>>>
>>>>>> When looking at garbage collection for these reduce tasks it's
>>>>>> continually doing garbage collections.
>>>>>> Like this:
>>>>>> 2011-02-17T14:36:08.295+0100: 1250.547: [Full GC [PSYoungGen:
>>>>>> 111055K->53659K(233024K)] [ParOldGen: 698410K->698410K(699072K)]
>>>>>> 809466K->752070K(932096K) [PSPermGen: 14450K->14450K(21248K)],
>>>>>> 0.1496600
>>>>>> secs] [Times: user=1.08 sys=0.00, real=0.15 secs]
>>>>>> 2011-02-17T14:36:08.600+0100: 1250.851: [Full GC [PSYoungGen:
>>>>>> 111057K->53660K(233024K)] [ParOldGen: 698410K->698410K(699072K)]
>>>>>> 809468K->752070K(932096K) [PSPermGen: 14450K->14450K(21248K)],
>>>>>> 0.1360010
>>>>>> secs] [Times: user=1.00 sys=0.01, real=0.13 secs]
>>>>>> 2011-02-17T14:36:08.915+0100: 1251.167: [Full GC [PSYoungGen:
>>>>>> 111058K->53659K(233024K)] [ParOldGen: 698410K->698410K(699072K)]
>>>>>> 809468K->752070K(932096K) [PSPermGen: 14450K->14450K(21248K)],
>>>>>> 0.1325960
>>>>>> secs] [Times: user=0.94 sys=0.00, real=0.14 secs]
>>>>>> 2011-02-17T14:36:09.205+0100: 1251.457: [Full GC [PSYoungGen:
>>>>>> 111055K->53659K(233024K)] [ParOldGen: 698410K->698410K(699072K)]
>>>>>> 809466K->752070K(932096K) [PSPermGen: 14450K->14450K(21248K)],
>>>>>> 0.1301610
>>>>>> secs] [Times: user=0.99 sys=0.00, real=0.13 secs]
>>>>>>
>>>>>>
>>>>>> “mapred.child.java.opts” set to “-Xmx1024M -XX:+UseCompressedOops
>>>>>> -XX:+UseParallelOldGC -XX:+UseNUMA -Djava.net.preferIPv4Stack=true
>>>>>> -verbose:gc -XX:+PrintGCDateStamps -XX:+PrintGCDetails
>>>>>> -Xloggc:/opt/hadoop/logs/task_@tas...@.gc.log”
>>>>>>
>>>>>> I've been reducing this parameter
>>>>>> “hive.exec.reducers.bytes.per.reducer”
>>>>>> to as low as 200M but I still get the OutOfMemory errors. I would have
>>>>>> expected this would drop the amount of data send to the reducers and
>>>>>> thus not have the OutOfMemory errors to happen.
>>>>>>
>>>>>> Any idea's on why this happens?
>>>>>>
>>>>>> I'm using a trunk build from around 2011-02-03
>>>>>>
>>>>>
>>
>

Reply via email to