Hey Jordan,

There are two settings you need to care about when changing memory. One is
yarn.container.memory.mb. The other is the task.opts setting, which will
allow you to specify a custom -Xmx parameter.

If you set the yarn.container.memory.mb to 4000 (4GB), as you suggest, the
JAVA_OPTS check still takes affect, and your Xmx will remain 768MB. If you
wish to increase your heap, you'll need to set both
yarn.container.memory.mb *and* the task.opts setting to be something like
task.opts=-Xmx3g. This will tell YARN to kill any container that uses > 4GB
of physical memory on the machine. The Xmx tells the container not to use
more than 3GB of heap.

It's a bit confusing, but there's a reason for this. The problem we have is
that we (Samza) don't know how much off-heap/non-JVM memory you're going to
use in your container. YARN only pays attention to the amount of physical
memory used by a process. With Java, you can set the heap, but there's also
permgen, JVM, JNI libraries, and off-heap memory usage. All of these
contribute to the physical memory usage that YARN cares about, but are
outside the JVM heap. This means that we can't just use one memory setting
for both YARN and Java. We have to have two.

Cheers,
Chris

On Tue, Mar 10, 2015 at 1:13 AM, Jordan Shaw <jor...@pubnub.com> wrote:

> Hey Everyone,
> This I have a question somewhat related to SAMZA-109 and this line in
> run-class.sh:
> # Check if a max-heap size is specified. If not - set a 768M heap [[
> $JAVA_OPTS != *-Xmx* ]] && JAVA_OPTS="$JAVA_OPTS -Xmx768M"
>
> If I were to set the container.memory.mb for yarn to 4GB (
> yarn.container.memory.mb = 4096) the above JAVA_OPTS check would cause this
> to be ignored right? I can understand why preventing the Heap from going
> crazy is important in these long running jobs but to me this might cause
> some confusion especially when trying to debug a thrashing GC and Java
> isn't committing the memory amount set in yarn.container.memory.mb.
>
> I was trying to follow why this was decided on in SAMZA-109 but nothing
> stuck out to me. Would it make more sense to just not specifying a max heap
> and letting Yarn kill the job if it goes over it's specified allotment
> and/or let the user explicitly set these opts?  Thanks!
>
> - Jordan
>

Reply via email to