Re: SparkDriver memory calculation mismatch

Sean Owen Sat, 12 Nov 2016 01:41:10 -0800

Indeed, you get default values if you don't specify concrete values
otherwise. Yes, you should see the docs for the version you're using.


Note that there are different configs for the new 'unified' memory manager
since 1.6, and so some older resources may be correctly explaining the
older 'legacy' memory manager configs.

Yes all containers would have to be smaller than YARN's max allowed size.
The driver just consumes what the driver consumes; I don't know of any
extra 'appmaster' component.
What do you mean by 'launched by the map task'? jobs are launched by the
driver only.

On Sat, Nov 12, 2016 at 9:14 AM Elkhan Dadashov <elkhan8...@gmail.com>
wrote:

> @Sean Owen,
>
> Thanks for your reply.
>
> I put the wrong link to the blog post. Here is the correct link
> <https://www.altiscale.com/blog/tips-and-tricks-for-running-spark-on-hadoop-part-4-memory-settings/>
> which describes Spark Memory settings on Yarn. I guess they have misused
> the terms Spark driver/BlockManager, and explained memory usage of driver
> falsely.
>
> 1) Then does that mean if nothing specified, then Spark will use defaults
> specified in Spark config site
> <http://spark.apache.org/docs/latest/running-on-yarn.html> ?
>
> 2) Let me clarify, if i understood it correctly:
>
> (due to Yarn restrictions)
>
> *Yarn-cluster mode*:
> SparkAppMaster+Driver Memory < Yarn container max size allocation
> SparkExecutor Memory < Yarn container max size allocation
>
> *Yarn-client mode* (assume Spark Job is launched from the map task):
> Driver memory is independent of any Yarn properties, only limited by
> machines memory.
> SparkAppMaster Memory < Yarn container max size allocation
> SparkExecutor Memory < Yarn container max size allocation
>
> Did i get it correctly ?
>
> 3) Any resource for Spark components memory calculations for Yarn cluster
> ? (other than this site which describes default config values
> http://spark.apache.org/docs/latest/running-on-yarn.html )
>
> Thanks.
>
> On Sat, Nov 12, 2016 at 12:24 AM, Sean Owen <so...@cloudera.com> wrote:
>
> If you're pointing at the 336MB, then it's not really related any of the
> items you cite here. This is the memory managed internally by MemoryStore.
> The blog post refers to the legacy memory manager. You can see a bit of how
> it works in the code, but this is the sum of the on-heap and off-heap
> memory it can manage. See the memory config docs, however, to understand
> what user-facing settings you can make; you don't really need to workk
> about this value.
>
> mapreduce settings are irrelevant to Spark.
> Spark doesn't pay attention to the YARN settings, but YARN does. It
> enforces them, yes. It is not exempt from YARN.
>
> 896MB is correct there. yarn-client mode does not ignore driver
> properties, no.
>
> On Sat, Nov 12, 2016 at 2:18 AM Elkhan Dadashov <elkhan8...@gmail.com>
> wrote:
>
> Hi,
>
> Spark website <http://spark.apache.org/docs/latest/running-on-yarn.html> 
> indicates
> default spark properties as like this:
> I did not override any properties in spark-defaults.conf file, but when I
> launch Spark in YarnClient mode:
>
> spark.driver.memory 1g
> spark.yarn.am.memory 512m
> spark.yarn.am.memoryOverhead : max(spark.yarn.am.memory * 0.10, 384m)
> spark.yarn.driver.memoryOverhead : max(spark.driver.memory * 0.10, 384m)
>
> I launch Spark job via SparkLauncher#startApplication() in *Yarn-client
> mode from the Map task of Hadoop job*.
>
> *My cluster settings*:
> yarn.scheduler.minimum-allocation-mb 256
> yarn.scheduler.maximum-allocation-mb 2048
> yarn.app.mapreduce.am.resource.mb 512
> mapreduce.map.memory.mb 640
> mapreduce.map.java.opts -Xmx400m
> yarn.app.mapreduce.am.command-opts -Xmx448m
>
> *Logs of Spark job*:
>
> INFO Client: Verifying our application has not requested more than the
> maximum memory capability of the cluster (2048 MB per container)
> INFO Client: Will allocate *AM container*, with 896 MB memory including
> 384 MB overhead
>
> INFO MemoryStore: MemoryStore started with capacity 366.3 MB
>
> ./application_1478727394310_0005/container_1478727394310_0005_01_000002/stderr:INFO:
> 16/11/09 14:18:42 INFO BlockManagerMasterEndpoint: Registering block
> manager <machine-ip>:57246 with *366.3* MB RAM, BlockManagerId(driver,
> <machine-ip>, 57246)
>
> *Questions*:
> 1) How is driver memory calculated ?
>
> How did Spark decide for 366 MB for driver based on properties described
> above ?
>
> I thought the memory allocation is based on this formula (
> https://www.altiscale.com/blog/spark-on-hadoop/ ):
>
> "Runtime.getRuntime.maxMemory * memoryFraction * safetyFraction ,where
> memoryFraction=0.6, and safetyFraction=0.9. This is 1024MB x 0.6 x 0.9 =
> 552.96MB. However, 552.96MB is a little larger than the value as shown in
> the log. This is because of the runtime overhead imposed by Scala, which is
> usually around 3-7%, more or less. If you do the calculation using 982MB x
> 0.6 x 0.9, (982MB being approximately 4% of 1024) then you will derive the
> number 530.28MB, which is what is indicated in the log file after rounding
> up to 530.30MB."
>
> 2) If Spark job is launched from the Map task via
> SparkLauncher#startApplication() will driver memory respect
> (mapreduce.map.memory.mb and mapreduce.map.java.opts) OR
> (yarn.scheduler.maximum-allocation-mb) when launching Spark Job as child
> process ?
>
> The confusion is, as SparkSubmit is a new JVM process - because it is
> launched as child process of the map task, and it does not depend on Yarn
> configs. But not obeying any limits (if this is the case), will make things
> tricky on NodeManager reporting back memory usage.
>
> 3) Is this correct formula for calculating AM memory ?
>
> For AM it matches to this formula calculation (
> https://www.altiscale.com/blog/spark-on-hadoop/ ):how much memory to
> allocate to the AM: amMemory + amMemoryOverhead
> amMemoryOverhead is set to 384MB via spark.yarn.driver.memoryOverhead.
> args.amMemory is fixed at 512MB by Spark when it’s running in yarn-client
> mode. Adding 384MB of overhead to 512MB provides the 896MB figure requested
> by Spark.
>
> 4) For Spark Yarn-client mode, are all spark.driver properties ignored,
> and only spark.yarn.am properties used ?
>
> Thanks.
>
>
>
>
>
>
>
> --
>
> Best regards,
> Elkhan Dadashov
>

Re: SparkDriver memory calculation mismatch

Reply via email to