Hi, After moving to Cloudera 0.20.1 release and upgrade to 64GB machines, started facing occasional OOMs with higher number of reducers when reducers started copying map outputs.
java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1539) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1432) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1285) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1216) ,Error: Turned out the problem was related to java int usein ReducerTask ShuffleRamManager reserve method check- // Wait till the request can be fulfilled... while ((size + requestedSize) > maxSize) { The check fails if (size+requestedSize) exceeds Integer.MAX_VALUE and "wraps around" into a negative value thus failing the check. This forces all subsequent requests to keep on reserving the RAM and finally crash the JVM. Checked if it was related to HADOOP-3446 or being resolved by HADOOP-318. Looks like the problem would not occur after HADOOP-318 as Arun uses "long" for size rather than the current buggy "int". Should a JIRA be raised to fix this for pre-0.21.0 release. My fix was simple- while (((long)size + (long)requestedSize) > maxSize) { I would be willing to create a JIRA and patch. -Sanjay Follow our updates on www.twitter.com/impetuscalling. * Impetus is sponsoring Internet Summit '09, a premier event in Raleigh, NC from November 4-5, 2009. Visit www.impetus.com/events.html for details. NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.