Hi,
After moving to Cloudera 0.20.1 release and upgrade to 64GB machines, started 
facing occasional OOMs with higher number of reducers when reducers started 
copying map outputs.

java.lang.OutOfMemoryError: Java heap space at 
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1539)
 at 
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1432)
 at 
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1285)
 at 
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1216)
 ,Error:

Turned out the problem was related to java int usein ReducerTask 
ShuffleRamManager reserve method check-
                                     // Wait till the request can be 
fulfilled...
                                     while ((size + requestedSize) > maxSize) {

The check fails if (size+requestedSize) exceeds Integer.MAX_VALUE and "wraps 
around" into a negative value thus failing the check. This forces all 
subsequent requests to keep on reserving the RAM and finally crash the JVM.

Checked if it was related to HADOOP-3446 or being resolved by HADOOP-318.

Looks like the problem would not occur after HADOOP-318 as Arun uses "long" for 
size rather than the current buggy "int".

Should a JIRA be raised to fix this for pre-0.21.0 release.
My fix was simple- while (((long)size + (long)requestedSize) > maxSize) {

I would be willing to create a JIRA and patch.

-Sanjay

Follow our updates on www.twitter.com/impetuscalling.

* Impetus is sponsoring Internet Summit '09, a premier event in Raleigh, NC 
from November 4-5, 2009. Visit www.impetus.com/events.html for details.

NOTE: This message may contain information that is confidential, proprietary, 
privileged or otherwise protected by law. The message is intended solely for 
the named addressee. If received in error, please destroy and notify the 
sender. Any use of this email is prohibited when received in error. Impetus 
does not represent, warrant and/or guarantee, that the integrity of this 
communication has been maintained nor that the communication is free of errors, 
virus, interception or interference.

Reply via email to