Hi chinchu, Where does the code trying to read the file run? Is it running on the driver or on some executor?
If it's running on the driver, in yarn-cluster mode, the file should have been copied to the application's work directory before the driver is started. So hopefully just doing "new FileInputStream(foo)" will just work. That does make some assumptions about the code being run in yarn-cluster mode, though, and it may not work with a different master deployment. I'm not sure, without looking further, what are the expected semantics for reading these files from code not running in the executors. On Sat, Sep 20, 2014 at 1:14 AM, chinchu <chinchu....@gmail.com> wrote: > Thanks Andrew. > > I understand the problem a little better now. There was a typo in my earlier > mail & a bug in the code (causing the NPE in SparkFiles). I am using the > --master yarn-cluster (not local). And in this mode, the > com.test.batch.modeltrainer.ModelTrainerMain - my main-class will run on the > application master in yarn (3-node cluster) & the serialized file is on my > laptop:/tmp/myobject.ser. That is the reason I was using SparkFiles.get() to > get this file (and not just doing a new File("/tmp/myobject.ser")) > > 37: val serFile = SparkFiles.get("myobject.ser") > 38: val argsMap = deSerializeMapFromFile(serFile) > > But this gets me a FileNotFoundException: > /tmp/spark-3292c9e3-db06-43b1-89f1-423f40e8e84b/myobject.ser in > deSerializeMapFromFile(xxx). This runs in the spark "driver" and not the > executor, correct ? & that's why its probably not finding the file. > > * > Here's what I am trying to do: > my-laptop (has the /tmp/myobject.ser & /opt/test/lib/spark-test.jar) > launches spark-submit ---files .. ----> hadoop-yarn-cluster[3 nodes] > * > and on my laptop:$HADOOP_CONF_DIR, I have the configuration that points to > this 3-node yarn cluster. > > *What is the right way to get to this file (myobject.ser) in my main-class > (when running in spark-driver in yarn & not the executor) ?* > > Thanks again > -C > > PS: java.io.FileNotFoundException: > /tmp/spark-3292c9e3-db06-43b1-89f1-423f40e8e84b/myobject.ser (No such file > or directory) > at java.io.FileInputStream.open(Native Method) > at java.io.FileInputStream.<init>(FileInputStream.java:146) > at java.io.FileInputStream.<init>(FileInputStream.java:101) > at > com.test.batch.modeltrainer.ModelTrainerMain$.deSerializeMapFromFile(ModelTrainerMain.scala:96) > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/spark-submit-command-line-with-files-tp14645p14719.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > -- Marcelo --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org