A typical file is called
/dev/shm/psm_shm.41e04667-f3ba-e503-8464-db6c209b3430

I had assumed that these were from OMPI, but clearly I could be wrong.
They vary in size, but are typically 42MiB, only 0.2% of our small
diskless nodes' memory, but put a dozen in there and they start to be
noticed.  lsof shows all the processes in a particular job have the same
one open, the other files are associated chronologically with failed
jobs.

HTH

Martin Rushton
HPC System Manager, Weapons Technologies
Tel: 01959 514777, Mobile: 07939 219057
email: jmrush...@qinetiq.com
www.QinetiQ.com
QinetiQ - Delivering customer-focused solutions

Please consider the environment before printing this email.
-----Original Message-----
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On
Behalf Of Jeff Squyres
Sent: 14 April 2011 14:33
To: Open MPI Users
Subject: Re: [OMPI users] shm unlinking

On Apr 14, 2011, at 9:22 AM, Rushton Martin wrote:

> For your information: we were supplied with a script when we bought 
> the cluster, but the original script made the assumption that all 
> processes and shm files belonging to a specific user ought to be 
> deleted.  This is a problem if users submit jobs which only half fill 
> a node and the second job starts on the same node as the first one.  
> The first job to finish causes the continuing job to stop dead.  We 
> therefore had to disable any cleanup to allow jobs to run.  Now we are

> finding a slow fill up with the shm files and I need to do something; 
> at least now I have a way forward.

Note that Open MPI v1.4.x is likely using mmap files by default -- these
should be under /tmp/ somewhere.  If they get left around, they can
cause shared memory to be filled up, but they should also be unrelated
in /dev/shm kinds of things.  If you're seeing /dev/shm fill up, that
might be due to something else.

Also, I'm a little confused by your reference to psm_shm... are you
talking about the QLogic PSM device?  If that does some tomfoolery with
/dev/shm somewhere, I'm unaware of it (i.e., I don't know much/anything
about what that device does internally).

--
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
This email and any attachments to it may be confidential and are
intended solely for the use of the individual to whom it is 
addressed. If you are not the intended recipient of this email,
you must neither take any action based upon its contents, nor 
copy or show it to anyone. Please contact the sender if you 
believe you have received this email in error. QinetiQ may 
monitor email traffic data and also the content of email for 
the purposes of security. QinetiQ Limited (Registered in England
& Wales: Company Number: 3796233) Registered office: Cody Technology 
Park, Ively Road, Farnborough, Hampshire, GU14 0LX  http://www.qinetiq.com.

Reply via email to