Hi Kannan, Issues with using --jars make sense. I believe you can set the classpath via the use the --conf spark.executor.extraClassPath=.... or in your driver with .set("spark.executor.extraClassPath", ".....")
I believe you are correct with the localize as well as long as your guaranteed that all nodes have the same versions of said jars. -Todd On Thu, Feb 26, 2015 at 8:12 PM, Kannan Rajah <kra...@maprtech.com> wrote: > There is a usability concern I have with the current way of specifying > --jars. Imagine a use case like hbase where a lot of jobs need it in its > classpath. This needs to be set every time. If we use > spark.executor.extraClassPath, > then we just need to set it once But there is no programmatic way to set > this value, like picking up from an environment variable or by running a > script that generates classpath. You need to hard code the jars in > spark-defaults.conf. > > Also, I would like to know if there is a localization overhead when we use > spark.executor.extraClassPath. Again, in the case of hbase, these jars > would be typically available on all nodes. So there is no need to localize > them from the node where job was submitted. I am wondering if we use the > SPARK_CLASSPATH approach, then it would not do localization. That would be > an added benefit. > Please clarify. > > > > > -- > Kannan > > On Thu, Feb 26, 2015 at 4:15 PM, Marcelo Vanzin <van...@cloudera.com> > wrote: > >> SPARK_CLASSPATH is definitely deprecated, but my understanding is that >> spark.executor.extraClassPath is not, so maybe the documentation needs >> fixing. >> >> I'll let someone who might know otherwise comment, though. >> >> On Thu, Feb 26, 2015 at 2:43 PM, Kannan Rajah <kra...@maprtech.com> >> wrote: >> > SparkConf.scala logs a warning saying SPARK_CLASSPATH is deprecated and >> we >> > should use spark.executor.extraClassPath instead. But the online >> > documentation states that spark.executor.extraClassPath is only meant >> for >> > backward compatibility. >> > >> > >> https://spark.apache.org/docs/1.2.0/configuration.html#execution-behavior >> > >> > Which one is right? I have a use case to submit a hbase job from >> spark-shell >> > and make it run using YARN. In this case, I need to somehow add the >> hbase >> > jars to the classpath of the executor. If I add it to SPARK_CLASSPATH >> and >> > export it it works fine. Alternatively, if I set the >> > spark.executor.extraClassPath in spark-defaults.conf, it works fine. >> But the >> > reason I don't like spark-defaults.conf is that I need to hard code it >> > instead of relying on a script to generate the classpath. I can use a >> script >> > in spark-env.sh and set SPARK_CLASSPATH. >> > >> > Given that compute-classpath uses SPARK_CLASSPATH variable, why is it >> marked >> > as deprecated? >> > >> > -- >> > Kannan >> >> >> >> -- >> Marcelo >> > >