Hello!

I'm using Apache Spark with Mesos, and I've launched a job with 
coarse-mode=true. In my job, I must download a file from the internet, so I'm 
using:

import org.apache.spark.SparkFiles
sc.addFile("http://samplecsvs.s3.amazonaws.com/Sacramentorealestatetransactions.csv";)
val path = SparkFiles.get("Sacramentorealestatetransactions.csv")
val textRDD = sc.textFile(path)
... some stuff

But the job failed with the following error:

Job aborted due to stage failure: Task 1 in stage 8.0 failed 4 times, most 
recent failure: Lost task 1.3 in stage 8.0 (TID 58, slave-1): 
java.io.FileNotFoundException: File 
file:/tmp/spark-5dde1847-b433-4282-a535-57ba5e2c9b81/userFiles-0885c136-9df1-44b9-a531-343268edfb6c/Sacramentorealestatetransactions.csv
 does not exist
        at 
org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:534)
        at 
org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:747)
        at 
org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:524)
        at 
org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:409)
        at 
org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:140)
        at 
org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:341)
        at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:766)
        at 
org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordReader.java:108)
        at 
org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67)
        at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:239)
        at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:216)
        at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
        at org.apache.spark.scheduler.Task.run(Task.scala:88)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)

Indeed, the file is not downloaded inside the executor container. However it is 
downloaded in the driver container.

It seems spark doesn't copy this file on executor containers, can someone 
confirm this issue? Am I doing something wrong?

Kind regards,

Rémy

Ce message et toutes les pièces jointes (ci-après le "message") sont établis à 
l’intention exclusive des destinataires désignés. Il contient des informations 
confidentielles et pouvant être protégé par le secret professionnel. Si vous 
recevez ce message par erreur, merci d'en avertir immédiatement l'expéditeur et 
de détruire le message. Toute utilisation de ce message non conforme à sa 
destination, toute diffusion ou toute publication, totale ou partielle, est 
interdite, sauf autorisation expresse de l’émetteur. L'internet ne garantissant 
pas l'intégrité de ce message lors de son acheminement, Atos (et ses filiales) 
décline(nt) toute responsabilité au titre de son contenu. Bien que ce message 
ait fait l’objet d’un traitement anti-virus lors de son envoi, l’émetteur ne 
peut garantir l’absence totale de logiciels malveillants dans son contenu et ne 
pourrait être tenu pour responsable des dommages engendrés par la transmission 
de l’un d’eux.

This message and any attachments (the "message") are intended solely for the 
addressee(s). It contains confidential information, that may be privileged. If 
you receive this message in error, please notify the sender immediately and 
delete the message. Any use of the message in violation of its purpose, any 
dissemination or disclosure, either wholly or partially is strictly prohibited, 
unless it has been explicitly authorized by the sender. As its integrity cannot 
be secured on the internet, Atos and its subsidiaries decline any liability for 
the content of this message. Although the sender endeavors to maintain a 
computer virus-free network, the sender does not warrant that this transmission 
is virus-free and will not be liable for any damages resulting from any virus 
transmitted.

Reply via email to