Hi Johnny, Thank you very much for your email. I am very new here. Please tell me where to check the health of the data node (which log file should I look at?), and how to set the logging level of log4j to DEBUG.
Thanks, -Mua On Fri, Apr 12, 2013 at 5:01 PM, Johnny Zhang <[email protected]> wrote: > seems a HDFS issue, as you said, cannot retrieval certain block from > certain DN. Can you check the health of all DN? And properly also bump the > log4j level to DEBUG. > > Johnny > > > On Fri, Apr 12, 2013 at 12:06 PM, Mua Ban <[email protected]> wrote: > > > Thank you very much Cheolsoo, > > > > I am running the script once more right now and I see 7 failed reducers > at > > the moment on the job tracker GUI. I browse these failed reducers and I > > found the task logs. From these 7 failed reducers, some have type 1 task > > log, the rest have type 2 task log as I show below. > > > > They seem related to some connection issue among nodes in the cluster. Do > > you know any parameters I should configure to figure out the actual > > problem? > > > > Thank you, > > -Mua > > > > --------------------------------------- > > *Type 1 task log* > > > > 3-04-12 13:42:24,960 INFO org.apache.hadoop.mapred.ReduceTask: > > attempt_201304081613_0049_r_000009_0 Scheduled 5 outputs (0 slow hosts > and0 > > dup hosts) > > 2013-04-12 13:42:25,259 INFO org.apache.hadoop.mapred.ReduceTask: > > attempt_201304081613_0049_r_000009_0 Scheduled 1 outputs (0 slow hosts > and0 > > dup hosts) > > 2013-04-12 13:42:25,271 INFO org.apache.hadoop.mapred.ReduceTask: > > Initiating in-memory merge with 610 segments... > > 2013-04-12 13:42:25,273 INFO org.apache.hadoop.mapred.Merger: Merging 610 > > sorted segments > > 2013-04-12 13:42:25,275 INFO org.apache.hadoop.mapred.Merger: Down to the > > last merge-pass, with 610 segments left of total size: 96922927 bytes > > 2013-04-12 13:42:27,348 INFO org.apache.hadoop.mapred.ReduceTask: > > attempt_201304081613_0049_r_000009_0 Merge of the 610 files in-memory > > complete. Local file is > > > > > /hdfs/sp/filesystem/mapred/local/taskTracker/vul/jobcache/job_201304081613_0049/attempt_201304081613_0049_r_000009_0/output/map_6.out > > of size 96921713 > > 2013-04-12 13:42:27,349 INFO org.apache.hadoop.mapred.ReduceTask: > > attempt_201304081613_0049_r_000009_0 Thread waiting: Thread for merging > > on-disk files > > 2013-04-12 13:42:30,263 INFO org.apache.hadoop.mapred.ReduceTask: > > attempt_201304081613_0049_r_000009_0 Scheduled 1 outputs (0 slow hosts > and0 > > dup hosts) > > 2013-04-12 13:42:35,267 INFO org.apache.hadoop.mapred.ReduceTask: > > attempt_201304081613_0049_r_000009_0 Scheduled 2 outputs (0 slow hosts > and0 > > dup hosts) > > 2013-04-12 13:42:38,145 INFO org.apache.hadoop.mapred.ReduceTask: > Ignoring > > obsolete output of KILLED map-task: > 'attempt_201304081613_0049_m_000584_0' > > 2013-04-12 13:42:44,150 INFO org.apache.hadoop.mapred.ReduceTask: > Ignoring > > obsolete output of KILLED map-task: > 'attempt_201304081613_0049_m_000557_0' > > 2013-04-12 13:42:55,283 INFO org.apache.hadoop.mapred.ReduceTask: > > attempt_201304081613_0049_r_000009_0 Scheduled 1 outputs (0 slow hosts > and0 > > dup hosts) > > 2013-04-12 13:43:05,164 INFO org.apache.hadoop.mapred.ReduceTask: > Ignoring > > obsolete output of KILLED map-task: > 'attempt_201304081613_0049_m_000604_0' > > 2013-04-12 13:43:06,036 INFO org.apache.hadoop.mapred.ReduceTask: > > attempt_201304081613_0049_r_000009_0 Scheduled 1 outputs (0 slow hosts > and0 > > dup hosts) > > 2013-04-12 13:43:11,169 INFO org.apache.hadoop.mapred.ReduceTask: > Ignoring > > obsolete output of KILLED map-task: > 'attempt_201304081613_0049_m_000597_1' > > 2013-04-12 13:43:21,040 INFO org.apache.hadoop.mapred.ReduceTask: > > attempt_201304081613_0049_r_000009_0 Need another 5 map output(s) where 0 > > is already in progress > > 2013-04-12 13:43:21,040 INFO org.apache.hadoop.mapred.ReduceTask: > > attempt_201304081613_0049_r_000009_0 Scheduled 0 outputs (0 slow hosts > and0 > > dup hosts) > > 2013-04-12 13:44:21,042 INFO org.apache.hadoop.mapred.ReduceTask: > > attempt_201304081613_0049_r_000009_0 Need another 5 map output(s) where 0 > > is already in progress > > 2013-04-12 13:44:21,043 INFO org.apache.hadoop.mapred.ReduceTask: > > attempt_201304081613_0049_r_000009_0 Scheduled 1 outputs (0 slow hosts > and0 > > dup hosts) > > 2013-04-12 13:44:29,222 INFO org.apache.hadoop.mapred.ReduceTask: > Ignoring > > obsolete output of KILLED map-task: > 'attempt_201304081613_0049_m_000576_0' > > 2013-04-12 13:45:21,333 INFO org.apache.hadoop.mapred.ReduceTask: > > attempt_201304081613_0049_r_000009_0 Need another 4 map output(s) where 0 > > is already in progress > > 2013-04-12 13:45:21,333 INFO org.apache.hadoop.mapred.ReduceTask: > > attempt_201304081613_0049_r_000009_0 Scheduled 0 outputs (0 slow hosts > and0 > > dup hosts) > > 2013-04-12 13:46:01,334 INFO org.apache.hadoop.mapred.ReduceTask: > > attempt_201304081613_0049_r_000009_0 Scheduled 1 outputs (0 slow hosts > and0 > > dup hosts) > > 2013-04-12 13:46:06,341 INFO org.apache.hadoop.mapred.ReduceTask: > > attempt_201304081613_0049_r_000009_0 Scheduled 1 outputs (0 slow hosts > and0 > > dup hosts) > > 2013-04-12 13:46:21,350 INFO org.apache.hadoop.mapred.ReduceTask: > > attempt_201304081613_0049_r_000009_0 Need another 2 map output(s) where 0 > > is already in progress > > 2013-04-12 13:46:21,350 INFO org.apache.hadoop.mapred.ReduceTask: > > attempt_201304081613_0049_r_000009_0 Scheduled 0 outputs (0 slow hosts > and0 > > dup hosts) > > 2013-04-12 13:46:41,301 INFO org.apache.hadoop.mapred.ReduceTask: > Ignoring > > obsolete output of KILLED map-task: > 'attempt_201304081613_0049_m_000616_1' > > 2013-04-12 13:46:41,351 INFO org.apache.hadoop.mapred.ReduceTask: > > attempt_201304081613_0049_r_000009_0 Scheduled 2 outputs (0 slow hosts > and0 > > dup hosts) > > 2013-04-12 13:46:42,301 INFO org.apache.hadoop.mapred.ReduceTask: > > GetMapEventsThread exiting > > 2013-04-12 13:46:42,302 INFO org.apache.hadoop.mapred.ReduceTask: > > getMapsEventsThread joined. > > 2013-04-12 13:46:42,302 INFO org.apache.hadoop.mapred.ReduceTask: Closed > > ram manager > > 2013-04-12 13:46:42,302 INFO org.apache.hadoop.mapred.ReduceTask: > > Interleaved on-disk merge complete: 1 files left. > > 2013-04-12 13:46:42,302 INFO org.apache.hadoop.mapred.ReduceTask: > In-memory > > merge complete: 11 files left. > > 2013-04-12 13:46:42,303 INFO org.apache.hadoop.mapred.Merger: Merging 11 > > sorted segments > > 2013-04-12 13:46:42,303 INFO org.apache.hadoop.mapred.Merger: Down to the > > last merge-pass, with 11 segments left of total size: 3152550 bytes > > 2013-04-12 13:46:42,393 INFO org.apache.hadoop.mapred.ReduceTask: Merged > 11 > > segments, 3152550 bytes to disk to satisfy reduce memory limit > > 2013-04-12 13:46:42,394 INFO org.apache.hadoop.mapred.ReduceTask: > Merging 2 > > files, 100074247 bytes from disk > > 2013-04-12 13:46:42,395 INFO org.apache.hadoop.mapred.ReduceTask: > Merging 0 > > segments, 0 bytes from memory into reduce > > 2013-04-12 13:46:42,395 INFO org.apache.hadoop.mapred.Merger: Merging 2 > > sorted segments > > 2013-04-12 13:46:42,398 INFO org.apache.hadoop.mapred.Merger: Down to the > > last merge-pass, with 2 segments left of total size: 100074239 bytes > > 2013-04-12 13:57:45,872 WARN org.apache.hadoop.hdfs.DFSClient: > > DFSOutputStream ResponseProcessor exception for block > > blk_-199210310173610155_28360java.net.SocketTimeoutException: 69000 > millis > > timeout while waiting for channel to be ready for read. ch : > > java.nio.channels.SocketChannel[connected local=/10.6.25.33:47987remote=/ > > 10.6.25.33:49197] > > at > > > > > org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164) > > at > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155) > > at > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128) > > at java.io.DataInputStream.readFully(DataInputStream.java:189) > > at java.io.DataInputStream.readLong(DataInputStream.java:410) > > at > > > > > org.apache.hadoop.hdfs.protocol.DataTransferProtocol$PipelineAck.readFields(DataTransferProtocol.java:124) > > at > > > > > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2967) > > > > 2013-04-12 14:00:00,777 WARN org.apache.hadoop.hdfs.DFSClient: Error > > Recovery for block blk_-199210310173610155_28360 bad datanode[0] > > 10.6.25.33:49197 > > 2013-04-12 14:00:00,866 WARN org.apache.hadoop.hdfs.DFSClient: Error > > Recovery for block blk_-199210310173610155_28360 in pipeline > > 10.6.25.33:49197, 10.6.25.141:39369, 10.6.25.31:54563: bad datanode > > 10.6.25.33:49197 > > 2013-04-12 14:04:55,904 INFO org.apache.hadoop.hdfs.DFSClient: Exception > in > > createBlockOutputStream 10.6.25.33:49197 java.io.IOException: Bad > connect > > ack with firstBadLink as 10.6.25.32:53741 > > 2013-04-12 14:04:55,904 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning > > block blk_-8936348770928346278_28365 > > 2013-04-12 14:04:55,907 INFO org.apache.hadoop.hdfs.DFSClient: Excluding > > datanode 10.6.25.32:53741 > > 2013-04-12 14:06:07,789 WARN org.apache.hadoop.hdfs.DFSClient: > > DFSOutputStream ResponseProcessor exception for block > > blk_8322203915584568195_28367java.io.IOException: Bad response 1 for > block > > blk_8322203915584568195_28367 from datanode 10.6.25.31:54563 > > at > > > > > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2977) > > > > 2013-04-12 14:06:25,735 WARN org.apache.hadoop.hdfs.DFSClient: Error > > Recovery for block blk_8322203915584568195_28367 bad datanode[2] > > 10.6.25.31:54563 > > 2013-04-12 14:06:25,735 WARN org.apache.hadoop.hdfs.DFSClient: Error > > Recovery for block blk_8322203915584568195_28367 in pipeline > > 10.6.25.33:49197, 10.6.25.145:48897, 10.6.25.31:54563: bad datanode > > 10.6.25.31:54563 > > 2013-04-12 14:06:45,112 WARN org.apache.hadoop.mapred.Task: Parent died. > > Exiting attempt_201304081613_0049_r_000009_0 > > > > > > ---------------------------------------------------------- > > > > *Type 2 task log* > > > > > > 14_1' > > 2013-04-11 15:24:33,168 INFO org.apache.hadoop.mapred.ReduceTask: > Ignoring > > obsolete output of KILLED map-task: > 'attempt_201304081613_0046_m_000609_1' > > 2013-04-11 15:24:39,172 INFO org.apache.hadoop.mapred.ReduceTask: > Ignoring > > obsolete output of KILLED map-task: > 'attempt_201304081613_0046_m_000599_0' > > 2013-04-11 15:25:01,179 INFO org.apache.hadoop.mapred.ReduceTask: > > attempt_201304081613_0046_r_000010_0 Need another 5 map output(s) where 0 > > is already in progress > > 2013-04-11 15:25:01,179 INFO org.apache.hadoop.mapred.ReduceTask: > > attempt_201304081613_0046_r_000010_0 Scheduled 1 outputs (0 slow hosts > and0 > > dup hosts) > > 2013-04-11 15:25:24,203 INFO org.apache.hadoop.mapred.ReduceTask: > Ignoring > > obsolete output of KILLED map-task: > 'attempt_201304081613_0046_m_000577_0' > > 2013-04-11 15:25:51,529 INFO org.apache.hadoop.mapred.ReduceTask: > > attempt_201304081613_0046_r_000010_0 Scheduled 1 outputs (0 slow hosts > and0 > > dup hosts) > > 2013-04-11 15:26:00,227 INFO org.apache.hadoop.mapred.ReduceTask: > Ignoring > > obsolete output of KILLED map-task: > 'attempt_201304081613_0046_m_000556_1' > > 2013-04-11 15:26:01,558 INFO org.apache.hadoop.mapred.ReduceTask: > > attempt_201304081613_0046_r_000010_0 Need another 3 map output(s) where 0 > > is already in progress > > 2013-04-11 15:26:01,558 INFO org.apache.hadoop.mapred.ReduceTask: > > attempt_201304081613_0046_r_000010_0 Scheduled 2 outputs (0 slow hosts > and0 > > dup hosts) > > 2013-04-11 15:26:06,235 INFO org.apache.hadoop.mapred.ReduceTask: > Ignoring > > obsolete output of KILLED map-task: > 'attempt_201304081613_0046_m_000576_0' > > 2013-04-11 15:26:06,235 INFO org.apache.hadoop.mapred.ReduceTask: > Ignoring > > obsolete output of KILLED map-task: > 'attempt_201304081613_0046_m_000560_1' > > 2013-04-11 15:26:06,603 INFO org.apache.hadoop.mapred.ReduceTask: > > attempt_201304081613_0046_r_000010_0 Scheduled 1 outputs (0 slow hosts > and0 > > dup hosts) > > 2013-04-11 15:26:07,236 INFO org.apache.hadoop.mapred.ReduceTask: > > GetMapEventsThread exiting > > 2013-04-11 15:26:07,236 INFO org.apache.hadoop.mapred.ReduceTask: > > getMapsEventsThread joined. > > 2013-04-11 15:26:07,236 INFO org.apache.hadoop.mapred.ReduceTask: Closed > > ram manager > > 2013-04-11 15:26:07,236 INFO org.apache.hadoop.mapred.ReduceTask: > > Interleaved on-disk merge complete: 1 files left. > > 2013-04-11 15:26:07,236 INFO org.apache.hadoop.mapred.ReduceTask: > In-memory > > merge complete: 109 files left. > > 2013-04-11 15:26:07,238 INFO org.apache.hadoop.mapred.Merger: Merging 109 > > sorted segments > > 2013-04-11 15:26:07,238 INFO org.apache.hadoop.mapred.Merger: Down to the > > last merge-pass, with 109 segments left of total size: 23323822 bytes > > 2013-04-11 15:26:07,528 INFO org.apache.hadoop.mapred.ReduceTask: Merged > > 109 segments, 23323822 bytes to disk to satisfy reduce memory limit > > 2013-04-11 15:26:07,528 INFO org.apache.hadoop.mapred.ReduceTask: > Merging 2 > > files, 120503030 bytes from disk > > 2013-04-11 15:26:07,529 INFO org.apache.hadoop.mapred.ReduceTask: > Merging 0 > > segments, 0 bytes from memory into reduce > > 2013-04-11 15:26:07,529 INFO org.apache.hadoop.mapred.Merger: Merging 2 > > sorted segments > > 2013-04-11 15:26:07,531 INFO org.apache.hadoop.mapred.Merger: Down to the > > last merge-pass, with 2 segments left of total size: 120503022 bytes > > 2013-04-11 15:28:34,121 INFO org.apache.hadoop.hdfs.DFSClient: Exception > in > > createBlockOutputStream 10.6.25.144:40971 java.io.IOException: Bad > connect > > ack with firstBadLink as 10.6.25.32:53741 > > 2013-04-11 15:28:34,121 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning > > block blk_423162934996020555_28131 > > 2013-04-11 15:28:34,123 INFO org.apache.hadoop.hdfs.DFSClient: Excluding > > datanode 10.6.25.32:53741 > > 2013-04-11 15:58:06,150 INFO org.apache.hadoop.hdfs.DFSClient: Exception > in > > createBlockOutputStream 10.6.25.144:40971 java.io.IOException: Bad > connect > > ack with firstBadLink as 10.6.25.31:54563 > > 2013-04-11 15:58:06,150 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning > > block blk_-7269648817395135125_28148 > > 2013-04-11 15:58:06,152 INFO org.apache.hadoop.hdfs.DFSClient: Excluding > > datanode 10.6.25.31:54563 > > 2013-04-11 16:06:10,175 WARN org.apache.hadoop.mapred.Task: Parent died. > > Exiting attempt_201304081613_0046_r_000010_0 > > > > > > > > On Fri, Apr 12, 2013 at 2:25 PM, Cheolsoo Park <[email protected]> > > wrote: > > > > > Hi Mua, > > > > > > I guess you misunderstood me. The pig_****.log file is not a task log. > > > > > > You should look for task logs on data nodes where your task tracker > ran. > > > Here is some explanation regarding various log files in Hadoop and > where > > to > > > find them: > > > > > > > > > > > > http://blog.cloudera.com/blog/2009/09/apache-hadoop-log-files-where-to-find-them-in-cdh-and-what-info-they-contain/ > > > > > > Thanks, > > > Cheolsoo > > > > > > > > > On Fri, Apr 12, 2013 at 10:27 AM, Mua Ban <[email protected]> > wrote: > > > > > > > Thank you very much for your reply. > > > > > > > > Below is the stack log in the pig_****.log file > > > > > > > > Can you please give me some suggestions? > > > > > > > > -Mua > > > > ------------------ > > > > Backend error message > > > > --------------------- > > > > Task attempt_201304081613_0048_r_000001_0 failed to report status for > > 601 > > > > seconds. Killing! > > > > > > > > Pig Stack Trace > > > > --------------- > > > > ERROR 2997: Unable to recreate exception from backed error: Task > > > > attempt_201304081613_0048_r_000001_0 failed to report status for 601 > > > > seconds. Killing! > > > > > > > > org.apache.pig.backend.executionengine.ExecException: ERROR 2997: > > Unable > > > to > > > > recreate exception from backed error: Task > > > > attempt_201304081613_0048_r_000001_0 failed to report status for 601 > > > > seconds. Killing! > > > > at > > > > > > > > > > > > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getErrorMessages(Launcher.java:217) > > > > at > > > > > > > > > > > > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:152) > > > > at > > > > > > > > > > > > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:383) > > > > at org.apache.pig.PigServer.launchPlan(PigServer.java:1270) > > > > at > > > > > > org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1255) > > > > at org.apache.pig.PigServer.execute(PigServer.java:1245) > > > > at org.apache.pig.PigServer.executeBatch(PigServer.java:362) > > > > at > > > > > > org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:132) > > > > at > > > > > > > > > > > > > > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:193) > > > > at > > > > > > > > > > > > > > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165) > > > > at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84) > > > > at org.apache.pig.Main.run(Main.java:555) > > > > at org.apache.pig.Main.main(Main.java:111) > > > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) > > > > at > > > > > > > > > > > > > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60) > > > > at > > > > > > > > > > > > > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37) > > > > at java.lang.reflect.Method.invoke(Method.java:611) > > > > at org.apache.hadoop.util.RunJar.main(RunJar.java:156) > > > > > > > > > > > > > > ================================================================================ > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Apr 12, 2013 at 11:29 AM, Cheolsoo Park < > [email protected] > > > > >wrote: > > > > > > > > > Did you look at task logs to see why those tasks failed? Since > it's a > > > > > back-end error, the console output doesn't tell you much. Task logs > > > > should > > > > > have a stack trace that shows why it failed, and you can go from > > there. > > > > > > > > > > > > > > > > > > > > On Fri, Apr 12, 2013 at 8:18 AM, Mua Ban <[email protected]> > > wrote: > > > > > > > > > > > Hi, > > > > > > > > > > > > I am very new to PIG/Hadoop, I just started writing my first PIG > > > > script a > > > > > > couple days ago. I ran into this problem. > > > > > > > > > > > > My cluster has 9 nodes. I have to join two data sets big and > small, > > > > each > > > > > is > > > > > > collected for 4 weeks. I first take two subsets of my data set > > (which > > > > is > > > > > > for the first week of data), let's call them B1 and S1 for big > and > > > > small > > > > > > data sets of the first week. The entire data sets of 4 weeks is > B4 > > > and > > > > > S4. > > > > > > > > > > > > I ran my script on my cluster to join B1 and S1 and everything is > > > > fine. I > > > > > > got my joined data. However, when I ran my script to join B4 and > > S4, > > > > the > > > > > > script failed. B4 is 39GB, S4 is 300MB. B4 is skewed, some id > > appears > > > > > more > > > > > > frequently than others. I tried both 'using skewed' and 'using > > > > > replicated' > > > > > > modes for the join operation (by appending them to the end of the > > > below > > > > > > join clause), they both fail. > > > > > > > > > > > > Here is my script and i think it is very simple: > > > > > > > > > > > > *big = load 'bigdir/' using PigStorage(',') as (id:chararray, > > > > > > data:chararray);* > > > > > > *small = load 'smalldir/' using PigStorage(',') as > > > > > > (t1:double,t2:double,data:chararray,id:chararray); > > > > > > * > > > > > > *J = JOIN big by id LEFT OUTER, small by id; > > > > > > * > > > > > > *store J into 'outputdir' using PigStorage(','); > > > > > > * > > > > > > > > > > > > On the web ui of the tracker, I see that the job has 40 reducers > (I > > > > guess > > > > > > since the total data is about 40GB, and each 1GB will need one > > > reducer > > > > by > > > > > > default of PIG and hadoop setting, so this is normal). If I use > > > > 'parallel > > > > > > 80' in the join operation above, then I see 80 reducers, and the > > join > > > > > > operation still failed. > > > > > > > > > > > > I checked file mapred-default.xml and found this: > > > > > > <name>mapred.reduce.tasks</name> > > > > > > <value>1</value> > > > > > > > > > > > > If I set the value of parallel in join operation, it should > > overwrite > > > > > this, > > > > > > right? > > > > > > > > > > > > > > > > > > On the tracker GUI, I see that for different runs, the number of > > > > > completed > > > > > > reducers changes from 4 to 10 (out of 40 total reducers). The > > tracker > > > > GUI > > > > > > shows the reason for the failed reducers: "Task > > > > > > attempt_201304081613_0046_r_000006_0 failed to report status for > > 600 > > > > > > seconds. Killing!" > > > > > > > > > > > > *Could you please help?* > > > > > > Thank you very much, > > > > > > -Mua > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -------------------------------------------------------------------------------------------------------------- > > > > > > Here is the error report from the console screen where I ran this > > > > script: > > > > > > > > > > > > job_201304081613_0032 616 0 230 12 32 0 > > 0 > > > > > > 0 big MAP_ONLY > > > > > > job_201304081613_0033 705 1 21 6 6 > > 234 2 > > > > > > 34 234 SAMPLER > > > > > > > > > > > > Failed Jobs: > > > > > > JobId Alias Feature Message Outputs > > > > > > job_201304081613_0034 small SKEWED_JOIN Message: Job > > failed! > > > > > > Error - # of failed Reduce Tasks exceeded allowed limit. > > FailedCount: > > > > 1. > > > > > > LastFailedTask: task_201304081613_0034_r_000012 > > > > > > > > > > > > Input(s): > > > > > > Successfully read 364285458 records (39528533645 bytes) from: > > > > > > "hdfs://d0521b01:24990/user/abc/big/" > > > > > > Failed to read data from "hdfs://d0521b01:24990/user/abc/small/" > > > > > > > > > > > > Output(s): > > > > > > > > > > > > Counters: > > > > > > Total records written : 0 > > > > > > Total bytes written : 0 > > > > > > Spillable Memory Manager spill count : 0 > > > > > > Total bags proactively spilled: 0 > > > > > > Total records proactively spilled: 0 > > > > > > > > > > > > Job DAG: > > > > > > job_201304081613_0032 -> job_201304081613_0033, > > > > > > job_201304081613_0033 -> job_201304081613_0034, > > > > > > job_201304081613_0034 -> null, > > > > > > null > > > > > > > > > > > > > > > > > > 2013-04-10 20:11:23,815 [main] WARN > > > > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > > > > > > - Encountered Warning > > > > > > REDUCER_COUNT_LOW 1 time(s). > > > > > > 2013-04-10 20:11:23,815 [main] INFO > > > > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > > > > > > - Some jobs have faile > > > > > > d! Stop running all dependent jobs > > > > > > 2013-04-10 20:11:23,815 [main] ERROR > > > > > org.apache.pig.tools.grunt.GruntParser > > > > > > - ERROR 2997: Encountered IOException. java.io.IOException: Er > > > > > > ror Recovery for block blk_312487981794332936_26563 failed > because > > > > > > recovery from primary datanode 10.6.25.31:54563 failed 6 times. > > > Pipel > > > > > > ine was 10.6.25.31:54563. Aborting... > > > > > > Details at logfile: > > > > /homes/abc/pig-flatten/scripts/pig_1365627648226.log > > > > > > 2013-04-10 20:11:23,818 [main] ERROR > > > > > org.apache.pig.tools.grunt.GruntParser > > > > > > - ERROR 2244: Job failed, hadoop does not return any error mes > > > > > > sage > > > > > > Details at logfile: > > > > /homes/abc/pig-flatten/scripts/pig_1365627648226.log > > > > > > > > > > > > > > > > > > > > >
