Hi Xu, As crazy as it might sound, this all makes sense.
There are a few different quantities at play here: * the heap size of the executor (controlled by --executor-memory) * the amount of memory spark requests from yarn (the heap size plus 384 mb to account for fixed memory costs outside if the heap) * the amount of memory yarn grants to the container (yarn rounds up to the nearest multiple of yarn.scheduler.minimum-allocation-mb or yarn.scheduler.fair.increment-allocation-mb, depending on the scheduler used) * the amount of memory spark uses for caching on each executor, which is spark.storage.memoryFraction (default 0.6) of the executor heap size So, with --executor-memory 8g, spark requests 8g + 384m from yarn, which doesn't fit into it's container max. With --executor-memory 7g, Spark requests 7g + 384m from yarn, which fits into its container max. This gets rounded up to 8g by the yarn scheduler. 7g is still used as the executor heap size, and .6 of this is about 4g, shown as the cache space in the spark. -Sandy > On Jun 5, 2014, at 9:44 AM, "Xu (Simon) Chen" <[email protected]> wrote: > > I am slightly confused about the "--executor-memory" setting. My yarn cluster > has a maximum container memory of 8192MB. > > When I specify "--executor-memory 8G" in my spark-shell, no container can be > started at all. It only works when I lower the executor memory to 7G. But > then, on yarn, I see 2 container per node, using 16G of memory. > > Then on the spark UI, it shows that each worker has 4GB of memory, rather > than 7. > > Can someone explain the relationship among the numbers I see here? > > Thanks.
