Re: [gridengine users] Linux OOM killer oom_adj

Skylar Thompson Thu, 30 Aug 2012 07:46:50 -0700

Oracle Java is particularly heinous when it comes to virtual memoryallocation. The Oracle Java that ships with RHEL6 x86_64 requests around23GB of memory even when it's run with just "-version". IBM Java is abit more reasonable, only requesting around 3GB to report its version.


-- Skylar Thompson (skyl...@u.washington.edu)
-- Genome Sciences Department, System Administrator
-- Foege Building S046, (206)-685-7354
-- University of Washington School of Medicine


On 08/29/12 09:09 AM, Brian Smith wrote:

We found h_vmem to be highly unpredictable, especially with java-based
applications.  Stack settings were screwed up, certain applications
wouldn't launch (segfaults), and hard limits were hard to determine for
things like MPI applications.  When your master has to launch 1024 MPI
sub-tasks (qrsh), it generally eats up more VMEM than the slave tasks
do.  It was just hard to get right.

-Brian

Brian Smith
Sr. System Administrator
Research Computing, University of South Florida
4202 E. Fowler Ave. SVC4010
Office Phone: +1 813 974-1467
Organization URL: http://rc.usf.edu

On 08/29/2012 11:33 AM, Reuti wrote:

Am 29.08.2012 um 17:21 schrieb Brian Smith:

We use mem_free variable as a consumable.  Then, we use a cronjob
called memkiller that terminates jobs if they go over their requested
(or default) memory allocation and


It would be more straight forward to use directly h_vmem. This is
controlled by SGE and the job exceeding the limit will be killed by
SGE. If you consume it as a consumable on a exechost level, it could
be set to the built in physical memory.

Was there any reason to use mem_free?

-- Reuti


1. Swap space on node is used
2. Swap rate is greater than 100 I/Os per second

The user gets emailed with a report if this happens.

This has made dealing with the oom killer a thing of the past in our
shop.

We manage memory on the principle that swap should NEVER be used.  If
you're hitting oom killer, you're pretty far beyond that in terms of
memory utilization; if performance is a consideration, MHO is you
should be looking to schedule your memory usage accordingly.  Oom
killer shouldn't be a factor if memory is handled as a scheduler
consideration.

-Brian

Brian Smith
Sr. System Administrator
Research Computing, University of South Florida
4202 E. Fowler Ave. SVC4010
Office Phone: +1 813 974-1467
Organization URL: http://rc.usf.edu

On 08/29/2012 11:02 AM, Ben De Luca wrote:

I was wondering, how people deal with oom conditions on there cluster.
We constantly have machines that die because the oom killer takes out
critical system services.

Has any experiance with the oom_adj proc value, or a patch to grid to
support it?


  /proc/[pid]/oom_adj (since Linux 2.6.11)
               This file can be used to adjust the score used to select
which process
               should be killed in an out-of-memory (OOM) situation.
The kernel uses
               this value for a bit-shift operation of the process's
oom_score value:
               valid values are in the range -16 to +15, plus the
special value -17,
               which disables OOM-killing altogether for this process.
A positive
               score increases the likelihood of this process being
killed by the OOM-
               killer; a negative score decreases the likelihood.  The
default value
               for this file is 0; a new process inherits its
parent's oom_adj
               setting.  A process must be privileged
(CAP_SYS_RESOURCE) to update
               this file.
_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] Linux OOM killer oom_adj

Reply via email to