I used spark standalone cluster on Windows 2008. I kept on getting the following error when trying to save an RDD to a windows shared folder
rdd.saveAsObjectFile("file:///T:/lab4-win02/IndexRoot01/tobacco-07/myrdd.obj") 15/05/22 16:49:05 ERROR Executor: Exception in task 0.0 in stage 12.0 (TID 12) java.io.IOException: Mkdirs failed to create file:/T:/lab4-win02/IndexRoot01/tobacco-07/tmp/docs-150522204904805.op/_temporary/0/_temporary/attempt_201505221649_0012_m_000000_12 at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:438) at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:424) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906) at org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java:1071) at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:270) at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:527) at org.apache.hadoop.mapred.SequenceFileOutputFormat.getRecordWriter(SequenceFileOutputFormat.java:63) at org.apache.spark.SparkHadoopWriter.open(SparkHadoopWriter.scala:90) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1068) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1059) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) at org.apache.spark.scheduler.Task.run(Task.scala:64) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) The T: drive is mapped to a windows shared folder, e.g. T: -> \\10.196.119.230\myshare The id running spark does have write permission to this folder. It works most of the time but failed sometime. Can anybody tell me what is the problem here? Please advise. Thanks.