Hi,
After moving to Cloudera 0.20.1 release and upgrade to 64GB machines, started
facing occasional OOMs with higher number of reducers when reducers started
copying map outputs.
java.lang.OutOfMemoryError: Java heap space at
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1539)
at
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1432)
at
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1285)
at
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1216)
,Error:
Turned out the problem was related to java int usein ReducerTask
ShuffleRamManager reserve method check-
// Wait till the request can be
fulfilled...
while ((size + requestedSize) > maxSize) {
The check fails if (size+requestedSize) exceeds Integer.MAX_VALUE and "wraps
around" into a negative value thus failing the check. This forces all
subsequent requests to keep on reserving the RAM and finally crash the JVM.
Checked if it was related to HADOOP-3446 or being resolved by HADOOP-318.
Looks like the problem would not occur after HADOOP-318 as Arun uses "long" for
size rather than the current buggy "int".
Should a JIRA be raised to fix this for pre-0.21.0 release.
My fix was simple- while (((long)size + (long)requestedSize) > maxSize) {
I would be willing to create a JIRA and patch.
-Sanjay
Follow our updates on www.twitter.com/impetuscalling.
* Impetus is sponsoring Internet Summit '09, a premier event in Raleigh, NC
from November 4-5, 2009. Visit www.impetus.com/events.html for details.
NOTE: This message may contain information that is confidential, proprietary,
privileged or otherwise protected by law. The message is intended solely for
the named addressee. If received in error, please destroy and notify the
sender. Any use of this email is prohibited when received in error. Impetus
does not represent, warrant and/or guarantee, that the integrity of this
communication has been maintained nor that the communication is free of errors,
virus, interception or interference.