Hi, If checkpoint data is already present in HDFS, driver fails to load as it is performing lookup on previous application directory. As that folder already exists, it fails to start context. Failed job's application id was application_1432284018452_0635 and job was performing lookup on application_1432284018452_0633 folder.
Here's snippet of exception stack trace- 15/06/10 05:28:36 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: Log directory hdfs://x.x.x.x:8020/user/myuser/spark/applicationHistory/application_1432284018452_0633 already exists!) Exception in thread "Driver" java.io.IOException: Log directory hdfs:// 172.16.201.171:8020/user/shn/spark/applicationHistory/application_1432284018452_0633 already exists! at org.apache.spark.util.FileLogger.createLogDir(FileLogger.scala:129) at org.apache.spark.util.FileLogger.start(FileLogger.scala:115) at org.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:74) at org.apache.spark.SparkContext.<init>(SparkContext.scala:368) at org.apache.spark.streaming.StreamingContext.<init>(StreamingContext.scala:118) at org.apache.spark.streaming.StreamingContext$$anonfun$getOrCreate$1.apply(StreamingContext.scala:561) at org.apache.spark.streaming.StreamingContext$$anonfun$getOrCreate$1.apply(StreamingContext.scala:561) at scala.Option.map(Option.scala:145) at org.apache.spark.streaming.StreamingContext$.getOrCreate(StreamingContext.scala:561) at Any idea on how to fix this issue? Thanks Ashish