Re: spark-submit command-line with --files

2014-09-20 Thread chinchu
Thanks Marcelo. The code trying to read the file always runs in the driver. I understand the problem with other master-deployment but will it work in local, yarn-client & yarn-cluster deployments.. that's all I care for now :-) Also what is the suggested way to do something like this ? Put the fil

Re: spark-submit command-line with --files

2014-09-20 Thread Marcelo Vanzin
Hi chinchu, Where does the code trying to read the file run? Is it running on the driver or on some executor? If it's running on the driver, in yarn-cluster mode, the file should have been copied to the application's work directory before the driver is started. So hopefully just doing "new FileIn

Re: spark-submit command-line with --files

2014-09-20 Thread chinchu
Thanks Andrew. I understand the problem a little better now. There was a typo in my earlier mail & a bug in the code (causing the NPE in SparkFiles). I am using the --master yarn-cluster (not local). And in this mode, the com.test.batch.modeltrainer.ModelTrainerMain - my main-class will run on the

Re: spark-submit command-line with --files

2014-09-19 Thread chinchu
Thanks Andrew. that helps On Fri, Sep 19, 2014 at 5:47 PM, Andrew Or-2 [via Apache Spark User List] < ml-node+s1001560n14708...@n3.nabble.com> wrote: > Hey just a minor clarification, you _can_ use SparkFiles.get in your > application only if it runs on the executors, e.g. in the following way: >

Re: spark-submit command-line with --files

2014-09-19 Thread Andrew Or
Hey just a minor clarification, you _can_ use SparkFiles.get in your application only if it runs on the executors, e.g. in the following way: sc.parallelize(1 to 100).map { i => SparkFiles.get("my.file") }.collect() But not in general (otherwise NPE, as in your case). Perhaps this should be docum

Re: spark-submit command-line with --files

2014-09-19 Thread Andrew Or
Hi Chinchu, SparkEnv is an internal class that is only meant to be used within Spark. Outside of Spark, it will be null because there are no executors or driver to start an environment for. Similarly, SparkFiles is meant to be used internally (though it's privacy settings should be modified to ref