Hi Amareshwari,
The "stream.non.zero.exit.status.is failure" is the default (which the
docs indicate is 'True').
We don't think the problem is the reducer script per se: Under one
circumstance we are investigating further it arises when the reducer
script does nothing but copy stdin to stdout, and when we include the
directive "-jobconf mapred.reduce.tasks=0" in the command to the
streaming facility.
Our streaming mapper and reducer scripts are in Python and we don't
quite understand the directions in the streaming docs:
"To set a status,reporter:status:<message> should be sent to stderr"
Could you elaborate a little further? In this case, the Python
scripts are not throwing any exceptions or otherwise indicating an
error, so we also don't have any internal error messages to send to
stderr.
The other problem is that when we run the map-reduce with the same
input multiple times it doesn't fail every time. That's why our first
suspicion was that it has something to do with timing between
components of the M-R system because of the modest performance of our
cluster.
Thanks,
RDH
On Dec 23, 2008, at 1:00 AM, Amareshwari Sriramadasu wrote:
You can report status from streaming job by emitting
reporter:status:<message> in stderr.
See documentation @
http://hadoop.apache.org/core/docs/r0.18.2/streaming.html#How+do+I+update+status+in+streaming+applications%3F
But from the exception trace, it doesn't look like lack of
report(timeout). The trace tells that reducer jvm process exited
with exit-code 1.
It's mostly a bug in reducer code. What is the configuration value
of the property "stream.non.zero.exit.status.is.failure" ?
Thanks
Amareshwari
Rick Hangartner wrote:
Hi,
We seem to be seeing an runtime exception in the Reduce phase of a
streaming Map-Reduce that has been mentioned before on this list.
http://mail-archives.apache.org/mod_mbox/hadoop-core-user/200805.mbox/%[email protected]%3e
When I Google the exception, the only thing returned is to this one
short thread on the mailing list. Unfortunately, we don't quite
understand the exception message in our current situation or the
eventual explanation and resolution of that previous case.
We have tested that the Python script run in the Reduce phase runs
without problems. It returns the correct results when run from the
command line fed from stdin by a file that is the output of the map
phase for a small map-reduce job that fails in this way.
Here's the exception we are seeing from the jobtracker log:
2008-12-22 18:13:36,415 INFO
org.apache.hadoop.mapred.JobInProgress: Task
'attempt_200812221742_0004_m_000009_0' has completed
task_200812221742_0004_m_000009 successfully.
2008-12-22 18:13:50,607 INFO
org.apache.hadoop.mapred.TaskInProgress: Error from
attempt_200812221742_0004_r_000000_0: java.lang.RuntimeException:
PipeMapRed.waitOutputThreads(): subprocess failed with code 1
at
org
.apache
.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:301)
at
org
.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:
518)
at
org.apache.hadoop.streaming.PipeReducer.reduce(PipeReducer.java:102)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:318)
at org.apache.hadoop.mapred.TaskTracker
$Child.main(TaskTracker.java:2207)
2008-12-22 18:13:52,045 INFO org.apache.hadoop.mapred.JobTracker:
Removed completed task 'attempt_200812221742_0004_r_000000_0' from
'tracker_hnode3.cor.mystrands.in:localhost/127.0.0.1:37777'
2008-12-22 18:13:52,175 INFO org.apache.hadoop.mapred.JobTracker:
Adding task 'attempt_200812221742_0004_r_000000_1' to tip
task_200812221742_0004_r_000000, for tracker
'tracker_hnode5.cor.mystrands.in:localhost/127.0.0.1:55254'
We typically see 4 repetitions of this exception in the log. And we
may see 1-2 sets of those repetitions.
If someone could explain what this exception actually means, and
perhaps what we might need to change in our configuration to fix
it, we would be most appreciative. Naively, it almost seems if a
task is just taking slightly too long to complete and report that
fact, perhaps because of other Hadoop or MR processes going on at
the same time. If we re-run this map-reduce, it does sometimes run
to a successful completion without an exception.
We are just testing map-reduce as a candidate for a number of data
reduction tasks right now. We are running Hadoop 18.1 on a cluster
of 9 retired desktop machines that just have 100Mb networking and
about 2GB of RAM each, so that's why we are suspecting this could
just be a problem that tasks are taking slightly too long to report
back they have completed, rather than an actual bug. (We will be
upgrading this test cluster to Hadoop 19.x and 1Gb networking very
shortly.)
Thanks,
RDH
Begin forwarded message:
From: "Rick Cox" <[email protected]>
Date: May 14, 2008 9:01:31 AM PDT
To: [email protected], [email protected]
Subject: Re: Streaming and subprocess error code
Reply-To: [email protected]
Does the syslog output from a should-have-failed task contain
something like this?
java.lang.RuntimeException: PipeMapRed.waitOutputThreads():
subprocess failed with code 1
(In particular, I'm curious if it mentions the RuntimeException.)
Tasks that consume all their input and then exit non-zero are
definitely supposed to be counted as failed, so there's either a
problem with the setup or a bug somewhere.
rick
On Wed, May 14, 2008 at 8:49 PM, Andrey Pankov
<[email protected]> wrote:
Hi,
I've tested this new option "-jobconf
stream.non.zero.exit.status.is.failure=true". Seems working but
still not
good for me. When mapper/reducer program have read all input data
successfully and fails after that, streaming still finishes
successfully so
there are no chances to know about some data post-processing
errors in
subprocesses :(
Andrey Pankov wrote: