Re: Unusual Failure of jobs

Karl Anderson Tue, 06 Jan 2009 20:08:30 -0800


On 22-Dec-08, at 10:24 AM, Nathan Marz wrote:

I have been experiencing some unusual behavior from Hadoop recently.When trying to run a job, some of the tasks fail with:
java.io.IOException: Task process exit with nonzero status of 1.
        at org.apache.hadoop.mapred.TaskRunner.runChild(TaskRunner.java:462)
        at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:403
Not all the tasks fail, but enough tasks fail such that the jobfails. Unfortunately, there are no further logs for these tasks.Trying to retrieve the logs produces:
HTTP ERROR: 410
Failed to retrieve stdout log for task:attempt_200811101232_0218_m_000001_0
RequestURI=/tasklog
It seems like the tasktracker isn't able to even start the tasks onthose machines. Has anyone seen anything like this before?

I see this on jobs that also get the "too many open files" taskerrors, or on subsequent jobs. I've always assumed that it's anothermanifestation of the same problem. Once I start getting these errors,I keep getting them until I shut down the cluster, although I don'talways get enough to cause a job to fail. I haven't botheredrestarting individual boxes or services.

I haven't been able to reproduce it consistently, but it seems tohappen when I have many small input files; a job with one large inputfile broke after I split the input up. I'm using Streaming.


Karl Anderson
[email protected]
http://monkey.org/~kra

Re: Unusual Failure of jobs

Reply via email to