I'm not an expert in this, but I do see a Broken pipe writing to a local
file system on your task tracker. Is it possible that you're out of disk
space, or your EBS volume is failing? S3 doesn't appear to be part of that
stack trace.

On Wednesday, June 12, 2013, Ravi Shetye wrote:

> In last 4-5 of day the task tracker on one of my slave machines has gone
> down couple of time. It has been working fine from the past 4-5 months
>
> The cluster configuration is
> 4 machine cluster on AWS
> 1 m2.xlarge master
> 3 m2.xlarge slaves
>
> The cluster is dedicated to run hive queries, with the data residing on s3.
>
> the slave on which the task tracker went down had the following log
>
> *******************************************************************
> 2013-06-11 00:26:30,968 INFO
> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
> dest: 10.190.***.***:60659, bytes: 38, op: MAPRED_SHUFFLE, cliID:
> attempt_201306071409_0151_m_005693_0, duration: 279198
> 2013-06-11 00:26:30,971 INFO
> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
> dest: 10.191.**.***:37605, bytes: 38, op: MAPRED_SHUFFLE, cliID:
> attempt_201306071409_0151_m_005700_0, duration: 193135
> 2013-06-11 00:26:30,971 INFO
> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
> dest: 10.190.***.***:60630, bytes: 6, op: MAPRED_SHUFFLE, cliID:
> attempt_201306071409_0151_m_005700_0, duration: 192011
> 2013-06-11 00:26:30,972 INFO
> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
> dest: 10.190.***.***:60656, bytes: 6, op: MAPRED_SHUFFLE, cliID:
> attempt_201306071409_0151_m_005693_0, duration: 178209
> 2013-06-11 00:26:30,973 INFO
> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
> dest: 10.8.***.**:45321, bytes: 6, op: MAPRED_SHUFFLE, cliID:
> attempt_201306071409_0151_m_005694_0, duration: 186452
> 2013-06-11 00:26:30,973 INFO
> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
> dest: 10.190.***.***:60659, bytes: 6, op: MAPRED_SHUFFLE, cliID:
> attempt_201306071409_0151_m_005694_0, duration: 157360
> 2013-06-11 00:26:30,974 INFO
> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
> dest: 10.8.***.**:45321, bytes: 38, op: MAPRED_SHUFFLE, cliID:
> attempt_201306071409_0151_m_005700_0, duration: 157555
> 2013-06-11 00:26:30,991 INFO org.apache.hadoop.mapred.JvmManager: JVM Not
> killed jvm_201306071409_0151_m_-435659475 but just removed
> 2013-06-11 00:26:30,991 INFO org.apache.hadoop.mapred.JvmManager: JVM :
> jvm_201306071409_0151_m_-435659475 exited with exit code 0. Number of tasks
> it ran: 0
> 2013-06-11 00:26:30,991 ERROR org.apache.hadoop.mapred.JvmManager: Caught
> Throwable in JVMRunner. Aborting TaskTracker.
> org.apache.hadoop.fs.FSError: java.io.IOException: Broken pipe
> at
> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:200)
>  at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)
> at
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:49)
>  at java.io.DataOutputStream.write(DataOutputStream.java:107)
> at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:220)
>  at sun.nio.cs.StreamEncoder.implClose(StreamEncoder.java:315)
> at sun.nio.cs.StreamEncoder.close(StreamEncoder.java:148)
>  at java.io.OutputStreamWriter.close(OutputStreamWriter.java:233)
> at java.io.BufferedWriter.close(BufferedWriter.java:265)
>  at java.io.PrintWriter.close(PrintWriter.java:312)
> at
> org.apache.hadoop.mapred.TaskController.writeCommand(TaskController.java:231)
>  at
> org.apache.hadoop.mapred.DefaultTaskController.launchTask(DefaultTaskController.java:126)
> at
> org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.runChild(JvmManager.java:497)
>  at
> org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.run(JvmManager.java:471)
> Caused by: java.io.IOException: Broken pipe
> at java.io.FileOutputStream.writeBytes(Native Method)
>  at java.io.FileOutputStream.write(FileOutputStream.java:297)
> at
> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:198)
>  ... 13 more
> 2013-06-11 00:26:31,007 INFO org.apache.hadoop.mapred.JvmManager: In
> JvmRunner constructed JVM ID: jvm_201306071409_0151_m_-495709221
> 2013-06-11 00:26:31,008 INFO
> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
> dest: 10.190.***.***:60656, bytes: 6, op: MAPRED_SHUFFLE, cliID:
> attempt_201306071409_0151_m_005694_0, duration: 222430
> 2013-06-11 00:26:31,008 INFO
> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
> dest: 10.190.***.***:60653, bytes: 38, op: MAPRED_SHUFFLE, cliID:
> attempt_201306071409_0151_m_005693_0, duration: 154027
> 2013-06-11 00:26:31,008 INFO
> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
> dest: 10.190.***.***:60659, bytes: 6, op: MAPRED_SHUFFLE, cliID:
> attempt_201306071409_0151_m_005700_0, duration: 132067
> 2013-06-11 00:26:31,326 INFO org.apache.hadoop.mapred.JvmManager: JVM
> Runner jvm_201306071409_0151_m_-495709221 spawned.
> 2013-06-11 00:26:31,328 INFO org.apache.hadoop.mapred.TaskController:
> Writing commands to
> /mnt/app/hadoop-tmp/ttprivate/taskTracker/piyushv/jobcache/job_201306071409_0151/attempt_201306071409_0151_m_005717_0/taskjvm.sh
> 2013-06-11 00:26:31,331 INFO
> org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 10.191.**.***:50060,
> dest: 10.190.***.***:60656, bytes: 38, op: MAPRED_SHUFFLE, cliID:
> attempt_201306071409_0151_m_005700_0, duration: 437236
> 2013-06-11 00:26:31,332 INFO org.apache.hadoop.mapred.TaskTracker:
> SHUTDOWN_MSG:
> /************************************************************
> SHUTDOWN_MSG: Shutting down TaskTracker at ip-10-191-**-***/10.191.**.***
> ************************************************************/
>
> --
> RAVI SHETYE
>

Reply via email to