I know this is a little off-topic, but I thought I'd pass on some hard-won 
knowledge to HPC cluster administrators...

Short version:
--------------

You should probably either disable the Linux OOM killer on your cluster (even 
if you have swap disabled on your compute nodes), or configure it so that it 
won't kill your critical cluster infrastructure (e.g., system-level daemons).  

More details can be found on my blog: 
http://blogs.cisco.com/performance/why-mpi-is-good-for-you-part-2/

More detail:
------------

I recently learned the hard way that the Linux Out Of Memory ("OOM") killer can 
really hose your cluster.  In my case, I had a bug in a development version of 
Open MPI that caused mpirun to consume ginormous amounts of memory and 
ultimately invoke the OOM killer.

The gist of it is that the OOM killer, by default, will kill any random process 
in an attempt to get more memory.  In my case, it killed the MySQL daemon, 
which is the database that my cluster manager (Bright) uses for critical 
information.  This left my SQL tables on disk in an unrecoverable state.

This made me be a very sad panda.  :-(

Moral of the story: you should probably either disable the OOM killer, or 
configure it so that it won't kill your critical cluster infrastructure 
daemons.  Maybe I'm a cluster admin n00b for not having done this in the first 
place, but I thought I'd pass on the knowledge nonetheless.

Sidenote: the above-mentioned bug was never in any released version of Open 
MPI.  But the point is that *any* Linux userspace process can still trigger 
OOM, and potentially do Very Bad Things.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/


Reply via email to