The exception was thrown out in application master(spark streaming driver) and the job shut down after this exception.
On Mon, Aug 11, 2014 at 10:29 AM, Chen Song <chen.song...@gmail.com> wrote: > I got the same exception after the streaming job runs for a while, The > ERROR message was complaining about a temp file not being found in the > output folder. > > 14/08/11 08:05:08 ERROR JobScheduler: Error running job streaming job > 1407744300000 ms.0 > java.io.FileNotFoundException: File > hdfs://hadoopc/user/csong/output/human_bot/-1407744300000.out/_temporary/0/task_201408110805_0000_m_000007 > does not exist. > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:654) > at > org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:102) > at > org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:712) > at > org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:708) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:708) > at > org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:360) > at > org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJob(FileOutputCommitter.java:310) > at > org.apache.hadoop.mapred.FileOutputCommitter.commitJob(FileOutputCommitter.java:136) > at > org.apache.spark.SparkHadoopWriter.commitJob(SparkHadoopWriter.scala:126) > at > org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopDataset(PairRDDFunctions.scala:841) > at > org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:724) > at > org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:643) > at org.apache.spark.rdd.RDD.saveAsTextFile(RDD.scala:1068) > at > org.apache.spark.streaming.dstream.DStream$$anonfun$8.apply(DStream.scala:773) > at > org.apache.spark.streaming.dstream.DStream$$anonfun$8.apply(DStream.scala:771) > at > org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:41) > at > org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:40) > at > org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:40) > at scala.util.Try$.apply(Try.scala:161) > at org.apache.spark.streaming.scheduler.Job.run(Job.scala:32) > at > org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:172) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > > > On Fri, Jul 25, 2014 at 7:04 PM, Bill Jay <bill.jaypeter...@gmail.com> > wrote: > >> I just saw another error after my job was run for 2 hours: >> >> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): >> No lease on /apps/data/vddil/real-time/checkpoint/temp: File does not >> exist. Holder DFSClient_NONMAPREDUCE_327993456_13 does not have any open >> files. >> at >> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2946) >> at >> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:2766) >> at >> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2674) >> at >> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:584) >> at >> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440) >> at >> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) >> at >> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) >> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) >> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) >> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) >> at java.security.AccessController.doPrivileged(Native Method) >> at javax.security.auth.Subject.doAs(Subject.java:415) >> at >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557) >> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) >> >> at org.apache.hadoop.ipc.Client.call(Client.java:1410) >> at org.apache.hadoop.ipc.Client.call(Client.java:1363) >> at >> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) >> at com.sun.proxy.$Proxy14.addBlock(Unknown Source) >> at sun.reflect.GeneratedMethodAccessor146.invoke(Unknown Source) >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> at java.lang.reflect.Method.invoke(Method.java:606) >> at >> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190) >> at >> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103) >> at com.sun.proxy.$Proxy14.addBlock(Unknown Source) >> at >> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:361) >> at >> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1439) >> at >> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1261) >> at >> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525) >> 14/07/25 14:45:12 WARN CheckpointWriter: Error in attempt 1 of writing >> checkpoint to >> hdfs://gnosis-01-01-01.crl.samsung.com/apps/data/vddil/real-time/checkpoint/checkpoint-1406324700000 >> >> >> >> All my jobs use the same parameter to the function checkpoint. Is it the >> reason for the error? >> >> I will post the stack trace of the other error after it appears again. >> Thanks! >> >> >> Bill >> >> >> On Fri, Jul 25, 2014 at 2:57 PM, Tathagata Das < >> tathagata.das1...@gmail.com> wrote: >> >>> Can you give a stack trace and logs of the exception? Its hard to say >>> anything without any associated stack trace and logs. >>> >>> TD >>> >>> >>> On Fri, Jul 25, 2014 at 1:32 PM, Bill Jay <bill.jaypeter...@gmail.com> >>> wrote: >>> >>>> Hi, >>>> >>>> I am running a Spark Streaming job that uses saveAsTextFiles to save >>>> results into hdfs files. However, it has an exception after 20 batches >>>> >>>> >>>> result-1406312340000/_temporary/0/task_201407251119_0000_m_000003 does not >>>> exist. >>>> >>>> >>>> When the job is running, I do not change any file in the folder. Does >>>> anyone know why the file cannot be found? >>>> >>>> Thanks! >>>> >>>> Bill >>>> >>> >>> >> > > > -- > Chen Song > > -- Chen Song