Hi Kannan,

Issues with using --jars make sense.  I believe you can set the classpath
via the use the --conf spark.executor.extraClassPath=.... or in your driver
with .set("spark.executor.extraClassPath", ".....")

I believe you are correct with the localize as well as long as your
guaranteed that all nodes have the same versions of said jars.

-Todd

On Thu, Feb 26, 2015 at 8:12 PM, Kannan Rajah <kra...@maprtech.com> wrote:

> There is a usability concern I have with the current way of specifying
> --jars. Imagine a use case like hbase where a lot of jobs need it in its
> classpath. This needs to be set every time. If we use 
> spark.executor.extraClassPath,
> then we just need to set it once But there is no programmatic way to set
> this value, like picking up from an environment variable or by running a
> script that generates classpath.  You need to hard code the jars in
> spark-defaults.conf.
>
> Also, I would like to know if there is a localization overhead when we use
> spark.executor.extraClassPath. Again, in the case of hbase, these jars
> would be typically available on all nodes. So there is no need to localize
> them from the node where job was submitted. I am wondering if we use the
> SPARK_CLASSPATH approach, then it would not do localization. That would be
> an added benefit.
> Please clarify.
>
>
>
>
> --
> Kannan
>
> On Thu, Feb 26, 2015 at 4:15 PM, Marcelo Vanzin <van...@cloudera.com>
> wrote:
>
>> SPARK_CLASSPATH is definitely deprecated, but my understanding is that
>> spark.executor.extraClassPath is not, so maybe the documentation needs
>> fixing.
>>
>> I'll let someone who might know otherwise comment, though.
>>
>> On Thu, Feb 26, 2015 at 2:43 PM, Kannan Rajah <kra...@maprtech.com>
>> wrote:
>> > SparkConf.scala logs a warning saying SPARK_CLASSPATH is deprecated and
>> we
>> > should use spark.executor.extraClassPath instead. But the online
>> > documentation states that spark.executor.extraClassPath is only meant
>> for
>> > backward compatibility.
>> >
>> >
>> https://spark.apache.org/docs/1.2.0/configuration.html#execution-behavior
>> >
>> > Which one is right? I have a use case to submit a hbase job from
>> spark-shell
>> > and make it run using YARN. In this case, I need to somehow add the
>> hbase
>> > jars to the classpath of the executor. If I add it to SPARK_CLASSPATH
>> and
>> > export it it works fine. Alternatively, if I set the
>> > spark.executor.extraClassPath in spark-defaults.conf, it works fine.
>> But the
>> > reason I don't like spark-defaults.conf is that I need to hard code it
>> > instead of relying on a script to generate the classpath. I can use a
>> script
>> > in spark-env.sh and set SPARK_CLASSPATH.
>> >
>> > Given that compute-classpath uses SPARK_CLASSPATH variable, why is it
>> marked
>> > as deprecated?
>> >
>> > --
>> > Kannan
>>
>>
>>
>> --
>> Marcelo
>>
>
>

Reply via email to