the logs i pasted are from worker logs only, spark does have permission to write into /opt, its not like the worker is not able to start........it runs perfectly for days, but then abruptly dies.
and its not always this machine, sometimes its some other machine. It happens once in a while, but whne it happens, the problem persists for at least a day, no matter how many times i restart the worker, it again dies with the same exception On Sun, Jul 12, 2015 at 12:42 PM, Akhil Das <ak...@sigmoidanalytics.com> wrote: > Can you dig a bit more in the worker logs? Also make sure that spark has > permission to write to /opt/ on that machine as its one machine always > throwing up. > > Thanks > Best Regards > > On Sat, Jul 11, 2015 at 11:18 PM, gaurav sharma <sharmagaura...@gmail.com> > wrote: > >> Hi All, >> >> I am facing this issue in my production environment. >> >> My worker dies by throwing this exception. >> But i see the space is available on all the partitions on my disk >> I did NOT see any abrupt increase in DIsk IO, which might have choked the >> executor to write on to the stderr file. >> >> But still my worker dies, this is not happening on all my workers, it's >> one machine that is performing this way. >> Could you please help me debug if it is happening because i am doing >> something wrong, or some issue from hardware/OS perspective, that i can >> debug and fix. >> >> >> 15/07/11 18:05:45 ERROR Worker: RECEIVED SIGNAL 1: SIGHUP >> 15/07/11 18:05:45 INFO ExecutorRunner: Killing process! >> 15/07/11 18:05:45 ERROR FileAppender: Error writing stream to file >> /opt/spark-1.4.0-bin-hadoop2.6/work/app-20150710162005-0001/16517/stderr >> java.io.IOException: Stream closed >> at >> java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:170) >> at java.io.BufferedInputStream.read1(BufferedInputStream.java:283) >> at java.io.BufferedInputStream.read(BufferedInputStream.java:345) >> at java.io.FilterInputStream.read(FilterInputStream.java:107) >> at >> org.apache.spark.util.logging.FileAppender.appendStreamToFile(FileAppender.scala:70) >> at >> org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply$mcV$sp(FileAppender.scala:39) >> at >> org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39) >> at >> org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39) >> at >> org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1772) >> at >> org.apache.spark.util.logging.FileAppender$$anon$1.run(FileAppender.scala:38) >> 15/07/11 18:05:46 INFO Utils: Shutdown hook called >> 15/07/11 18:05:46 INFO Utils: Deleting directory >> /tmp/spark-f269acd9-3ab0-4b3c-843c-bcf2e8c2669f >> 15/07/11 18:05:46 INFO Worker: Executor app-20150710162005-0001/16517 >> finished with state EXITED message Command exited with code 129 exitStatus >> 129 >> >> >> >> >> >