Greg, if you look carefully, the code is enforcing that the memoryOverhead
be lower (and not higher) than spark.driver.memory.

Thanks,
Nishkam

On Mon, Sep 22, 2014 at 1:26 PM, Greg Hill <greg.h...@rackspace.com> wrote:

>  I thought I had this all figured out, but I'm getting some weird errors
> now that I'm attempting to deploy this on production-size servers.  It's
> complaining that I'm not allocating enough memory to the memoryOverhead
> values.  I tracked it down to this code:
>
>
> https://github.com/apache/spark/blob/ed1980ffa9ccb87d76694ba910ef22df034bca49/yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala#L70
>
>  Unless I'm reading it wrong, those checks are enforcing that you set
> spark.yarn.driver.memoryOverhead to be higher than spark.driver.memory, but
> that makes no sense to me since that memory is just supposed to be what
> YARN needs on top of what you're allocating for Spark.  My understanding
> was that the overhead values should be quite a bit lower (and by default
> they are).
>
>  Also, why must the executor be allocated less memory than the driver's
> memory overhead value?
>
>  What am I misunderstanding here?
>
>  Greg
>
>   From: Andrew Or <and...@databricks.com>
> Date: Tuesday, September 9, 2014 5:49 PM
> To: Greg <greg.h...@rackspace.com>
> Cc: "user@spark.apache.org" <user@spark.apache.org>
> Subject: Re: clarification for some spark on yarn configuration options
>
>   Hi Greg,
>
>  SPARK_EXECUTOR_INSTANCES is the total number of workers in the cluster.
> The equivalent "spark.executor.instances" is just another way to set the
> same thing in your spark-defaults.conf. Maybe this should be documented. :)
>
>  "spark.yarn.executor.memoryOverhead" is just an additional margin added
> to "spark.executor.memory" for the container. In addition to the executor's
> memory, the container in which the executor is launched needs some extra
> memory for system processes, and this is what this "overhead" (somewhat of
> a misnomer) is for. The same goes for the driver equivalent.
>
>  "spark.driver.memory" behaves differently depending on which version of
> Spark you are using. If you are using Spark 1.1+ (this was released very
> recently), you can directly set "spark.driver.memory" and this will take
> effect. Otherwise, setting this doesn't actually do anything for client
> deploy mode, and you have two alternatives: (1) set the environment
> variable equivalent SPARK_DRIVER_MEMORY in spark-env.sh, and (2) if you are
> using Spark submit (or bin/spark-shell, or bin/pyspark, which go through
> bin/spark-submit), pass the "--driver-memory" command line argument.
>
>  If you want your PySpark application (driver) to pick up extra class
> path, you can pass the "--driver-class-path" to Spark submit. If you are
> using Spark 1.1+, you may set "spark.driver.extraClassPath" in your
> spark-defaults.conf. There is also an environment variable you could set
> (SPARK_CLASSPATH), though this is now deprecated.
>
>  Let me know if you have more questions about these options,
> -Andrew
>
>
> 2014-09-08 6:59 GMT-07:00 Greg Hill <greg.h...@rackspace.com>:
>
>>  Is SPARK_EXECUTOR_INSTANCES the total number of workers in the cluster
>> or the workers per slave node?
>>
>>  Is spark.executor.instances an actual config option?  I found that in a
>> commit, but it's not in the docs.
>>
>>  What is the difference between spark.yarn.executor.memoryOverhead and 
>> spark.executor.memory
>> ?  Same question for the 'driver' variant, but I assume it's the same
>> answer.
>>
>>  Is there a spark.driver.memory option that's undocumented or do you
>> have to use the environment variable SPARK_DRIVER_MEMORY?
>>
>>  What config option or environment variable do I need to set to get
>> pyspark interactive to pick up the yarn class path?  The ones that work for
>> spark-shell and spark-submit don't seem to work for pyspark.
>>
>>  Thanks in advance.
>>
>>  Greg
>>
>
>

Reply via email to