The third issue may be related to this:
https://issues.apache.org/jira/browse/SPARK-2022

We can take a look at this during the bug fix period for the 1.1
release next week. If we come up with a fix we can backport it into
the 1.0 branch also.

On Wed, Jul 30, 2014 at 11:31 PM, Patrick Wendell <pwend...@gmail.com> wrote:
> Thanks for digging around here. I think there are a few distinct issues.
>
> 1. Properties containing the '=' character need to be escaped.
> I was able to load properties fine as long as I escape the '='
> character. But maybe we should document this:
>
> == spark-defaults.conf ==
> spark.foo a\=B
> == shell ==
> scala> sc.getConf.get("spark.foo")
> res2: String = a=B
>
> 2. spark.driver.extraJavaOptions, when set in the properties file,
> don't affect the driver when running in client mode (always the case
> for mesos). We should probably document this. In this case you need to
> either use --driver-java-options or set SPARK_SUBMIT_OPTS.
>
> 3. Arguments aren't propagated on Mesos (this might be because of the
> other issues, or a separate bug).
>
> - Patrick
>
> On Wed, Jul 30, 2014 at 3:10 PM, Cody Koeninger <c...@koeninger.org> wrote:
>> In addition, spark.executor.extraJavaOptions does not seem to behave as I
>> would expect; java arguments don't seem to be propagated to executors.
>>
>>
>> $ cat conf/spark-defaults.conf
>>
>> spark.master
>> mesos://zk://etl-01.mxstg:2181,etl-02.mxstg:2181,etl-03.mxstg:2181/masters
>> spark.executor.extraJavaOptions -Dfoo.bar.baz=23
>> spark.driver.extraJavaOptions -Dfoo.bar.baz=23
>>
>>
>> $ ./bin/spark-shell
>>
>> scala> sc.getConf.get("spark.executor.extraJavaOptions")
>> res0: String = -Dfoo.bar.baz=23
>>
>> scala> sc.parallelize(1 to 100).map{ i => (
>>      |  java.net.InetAddress.getLocalHost.getHostName,
>>      |  System.getProperty("foo.bar.baz")
>>      | )}.collect
>>
>> res1: Array[(String, String)] = Array((dn-01.mxstg,null),
>> (dn-01.mxstg,null), (dn-01.mxstg,null), (dn-01.mxstg,null),
>> (dn-01.mxstg,null), (dn-01.mxstg,null), (dn-01.mxstg,null),
>> (dn-01.mxstg,null), (dn-01.mxstg,null), (dn-01.mxstg,null),
>> (dn-01.mxstg,null), (dn-01.mxstg,null), (dn-02.mxstg,null),
>> (dn-02.mxstg,null), ...
>>
>>
>>
>> Note that this is a mesos deployment, although I wouldn't expect that to
>> affect the availability of spark.driver.extraJavaOptions in a local spark
>> shell.
>>
>>
>> On Wed, Jul 30, 2014 at 4:18 PM, Cody Koeninger <c...@koeninger.org> wrote:
>>
>>> Either whitespace or equals sign are valid properties file formats.
>>> Here's an example:
>>>
>>> $ cat conf/spark-defaults.conf
>>> spark.driver.extraJavaOptions -Dfoo.bar.baz=23
>>>
>>> $ ./bin/spark-shell -v
>>> Using properties file: /opt/spark/conf/spark-defaults.conf
>>> Adding default property: spark.driver.extraJavaOptions=-Dfoo.bar.baz=23
>>>
>>>
>>> scala>  System.getProperty("foo.bar.baz")
>>> res0: String = null
>>>
>>>
>>> If you add double quotes, the resulting string value will have double
>>> quotes.
>>>
>>>
>>> $ cat conf/spark-defaults.conf
>>> spark.driver.extraJavaOptions "-Dfoo.bar.baz=23"
>>>
>>> $ ./bin/spark-shell -v
>>> Using properties file: /opt/spark/conf/spark-defaults.conf
>>> Adding default property: spark.driver.extraJavaOptions="-Dfoo.bar.baz=23"
>>>
>>> scala>  System.getProperty("foo.bar.baz")
>>> res0: String = null
>>>
>>>
>>> Neither one of those affects the issue; the underlying problem in my case
>>> seems to be that bin/spark-class uses the SPARK_SUBMIT_OPTS and
>>> SPARK_JAVA_OPTS environment variables, but nothing parses
>>> spark-defaults.conf before the java process is started.
>>>
>>> Here's an example of the process running when only spark-defaults.conf is
>>> being used:
>>>
>>> $ ps -ef | grep spark
>>>
>>> 514       5182  2058  0 21:05 pts/2    00:00:00 bash ./bin/spark-shell -v
>>>
>>> 514       5189  5182  4 21:05 pts/2    00:00:22 /usr/local/java/bin/java
>>> -cp
>>> ::/opt/spark/conf:/opt/spark/lib/spark-assembly-1.0.1-hadoop2.3.0-mr1-cdh5.0.2.jar:/etc/hadoop/conf-mx
>>> -XX:MaxPermSize=128m -Djava.library.path= -Xms512m -Xmx512m
>>> org.apache.spark.deploy.SparkSubmit spark-shell -v --class
>>> org.apache.spark.repl.Main
>>>
>>>
>>> Here's an example of it when the command line --driver-java-options is
>>> used (and thus things work):
>>>
>>>
>>> $ ps -ef | grep spark
>>> 514       5392  2058  0 21:15 pts/2    00:00:00 bash ./bin/spark-shell -v
>>> --driver-java-options -Dfoo.bar.baz=23
>>>
>>> 514       5399  5392 80 21:15 pts/2    00:00:06 /usr/local/java/bin/java
>>> -cp
>>> ::/opt/spark/conf:/opt/spark/lib/spark-assembly-1.0.1-hadoop2.3.0-mr1-cdh5.0.2.jar:/etc/hadoop/conf-mx
>>> -XX:MaxPermSize=128m -Dfoo.bar.baz=23 -Djava.library.path= -Xms512m
>>> -Xmx512m org.apache.spark.deploy.SparkSubmit spark-shell -v
>>> --driver-java-options -Dfoo.bar.baz=23 --class org.apache.spark.repl.Main
>>>
>>>
>>>
>>>
>>> On Wed, Jul 30, 2014 at 3:43 PM, Patrick Wendell <pwend...@gmail.com>
>>> wrote:
>>>
>>>> Cody - in your example you are using the '=' character, but in our
>>>> documentation and tests we use a whitespace to separate the key and
>>>> value in the defaults file.
>>>>
>>>> docs: http://spark.apache.org/docs/latest/configuration.html
>>>>
>>>> spark.driver.extraJavaOptions -Dfoo.bar.baz=23
>>>>
>>>> I'm not sure if the java properties file parser will try to interpret
>>>> the equals sign. If so you might need to do this.
>>>>
>>>> spark.driver.extraJavaOptions "-Dfoo.bar.baz=23"
>>>>
>>>> Do those work for you?
>>>>
>>>> On Wed, Jul 30, 2014 at 1:32 PM, Marcelo Vanzin <van...@cloudera.com>
>>>> wrote:
>>>> > Hi Cody,
>>>> >
>>>> > Could you file a bug for this if there isn't one already?
>>>> >
>>>> > For system properties SparkSubmit should be able to read those
>>>> > settings and do the right thing, but that obviously won't work for
>>>> > other JVM options... the current code should work fine in cluster mode
>>>> > though, since the driver is a different process. :-)
>>>> >
>>>> >
>>>> > On Wed, Jul 30, 2014 at 1:12 PM, Cody Koeninger <c...@koeninger.org>
>>>> wrote:
>>>> >> We were previously using SPARK_JAVA_OPTS to set java system properties
>>>> via
>>>> >> -D.
>>>> >>
>>>> >> This was used for properties that varied on a
>>>> per-deployment-environment
>>>> >> basis, but needed to be available in the spark shell and workers.
>>>> >>
>>>> >> On upgrading to 1.0, we saw that SPARK_JAVA_OPTS had been deprecated,
>>>> and
>>>> >> replaced by spark-defaults.conf and command line arguments to
>>>> spark-submit
>>>> >> or spark-shell.
>>>> >>
>>>> >> However, setting spark.driver.extraJavaOptions and
>>>> >> spark.executor.extraJavaOptions in spark-defaults.conf is not a
>>>> replacement
>>>> >> for SPARK_JAVA_OPTS:
>>>> >>
>>>> >>
>>>> >> $ cat conf/spark-defaults.conf
>>>> >> spark.driver.extraJavaOptions=-Dfoo.bar.baz=23
>>>> >>
>>>> >> $ ./bin/spark-shell
>>>> >>
>>>> >> scala> System.getProperty("foo.bar.baz")
>>>> >> res0: String = null
>>>> >>
>>>> >>
>>>> >> $ ./bin/spark-shell --driver-java-options "-Dfoo.bar.baz=23"
>>>> >>
>>>> >> scala> System.getProperty("foo.bar.baz")
>>>> >> res0: String = 23
>>>> >>
>>>> >>
>>>> >> Looking through the shell scripts for spark-submit and spark-class, I
>>>> can
>>>> >> see why this is; parsing spark-defaults.conf from bash could be
>>>> brittle.
>>>> >>
>>>> >> But from an ergonomic point of view, it's a step back to go from a
>>>> >> set-it-and-forget-it configuration in spark-env.sh, to requiring
>>>> command
>>>> >> line arguments.
>>>> >>
>>>> >> I can solve this with an ad-hoc script to wrap spark-shell with the
>>>> >> appropriate arguments, but I wanted to bring the issue up to see if
>>>> anyone
>>>> >> else had run into it,
>>>> >> or had any direction for a general solution (beyond parsing java
>>>> properties
>>>> >> files from bash).
>>>> >
>>>> >
>>>> >
>>>> > --
>>>> > Marcelo
>>>>
>>>
>>>

Reply via email to