Hi chinchu,

Where does the code trying to read the file run? Is it running on the
driver or on some executor?

If it's running on the driver, in yarn-cluster mode, the file should
have been copied to the application's work directory before the driver
is started. So hopefully just doing "new FileInputStream(foo)" will
just work.

That does make some assumptions about the code being run in
yarn-cluster mode, though, and it may not work with a different master
deployment. I'm not sure, without looking further, what are the
expected semantics for reading these files from code not running in
the executors.


On Sat, Sep 20, 2014 at 1:14 AM, chinchu <chinchu....@gmail.com> wrote:
> Thanks Andrew.
>
> I understand the problem a little better now. There was a typo in my earlier
> mail & a bug in the code (causing the NPE in SparkFiles). I am using the
> --master yarn-cluster (not local). And in this mode, the
> com.test.batch.modeltrainer.ModelTrainerMain - my main-class will run on the
> application master in yarn (3-node cluster) & the serialized file is on my
> laptop:/tmp/myobject.ser. That is the reason I was using SparkFiles.get() to
> get this file (and not just doing a new File("/tmp/myobject.ser"))
>
> 37: val serFile = SparkFiles.get("myobject.ser")
> 38: val argsMap =  deSerializeMapFromFile(serFile)
>
> But this gets me a FileNotFoundException:
> /tmp/spark-3292c9e3-db06-43b1-89f1-423f40e8e84b/myobject.ser in
> deSerializeMapFromFile(xxx). This runs in the  spark "driver" and not the
> executor, correct ? & that's why its probably not finding the file.
>
> *
> Here's what I am trying to do:
> my-laptop (has the /tmp/myobject.ser & /opt/test/lib/spark-test.jar)
> launches spark-submit ---files .. ----> hadoop-yarn-cluster[3 nodes]
> *
> and on my laptop:$HADOOP_CONF_DIR, I have the configuration that points to
> this 3-node yarn cluster.
>
> *What is the right way to get to this file (myobject.ser) in my main-class
> (when running in spark-driver in yarn & not the executor) ?*
>
> Thanks again
> -C
>
> PS: java.io.FileNotFoundException:
> /tmp/spark-3292c9e3-db06-43b1-89f1-423f40e8e84b/myobject.ser (No such file
> or directory)
>   at java.io.FileInputStream.open(Native Method)
>   at java.io.FileInputStream.<init>(FileInputStream.java:146)
>   at java.io.FileInputStream.<init>(FileInputStream.java:101)
>   at
> com.test.batch.modeltrainer.ModelTrainerMain$.deSerializeMapFromFile(ModelTrainerMain.scala:96)
>
>
>
>
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/spark-submit-command-line-with-files-tp14645p14719.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>



-- 
Marcelo

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to