A typical file is called /dev/shm/psm_shm.41e04667-f3ba-e503-8464-db6c209b3430
I had assumed that these were from OMPI, but clearly I could be wrong. They vary in size, but are typically 42MiB, only 0.2% of our small diskless nodes' memory, but put a dozen in there and they start to be noticed. lsof shows all the processes in a particular job have the same one open, the other files are associated chronologically with failed jobs. HTH Martin Rushton HPC System Manager, Weapons Technologies Tel: 01959 514777, Mobile: 07939 219057 email: jmrush...@qinetiq.com www.QinetiQ.com QinetiQ - Delivering customer-focused solutions Please consider the environment before printing this email. -----Original Message----- From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf Of Jeff Squyres Sent: 14 April 2011 14:33 To: Open MPI Users Subject: Re: [OMPI users] shm unlinking On Apr 14, 2011, at 9:22 AM, Rushton Martin wrote: > For your information: we were supplied with a script when we bought > the cluster, but the original script made the assumption that all > processes and shm files belonging to a specific user ought to be > deleted. This is a problem if users submit jobs which only half fill > a node and the second job starts on the same node as the first one. > The first job to finish causes the continuing job to stop dead. We > therefore had to disable any cleanup to allow jobs to run. Now we are > finding a slow fill up with the shm files and I need to do something; > at least now I have a way forward. Note that Open MPI v1.4.x is likely using mmap files by default -- these should be under /tmp/ somewhere. If they get left around, they can cause shared memory to be filled up, but they should also be unrelated in /dev/shm kinds of things. If you're seeing /dev/shm fill up, that might be due to something else. Also, I'm a little confused by your reference to psm_shm... are you talking about the QLogic PSM device? If that does some tomfoolery with /dev/shm somewhere, I'm unaware of it (i.e., I don't know much/anything about what that device does internally). -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/ _______________________________________________ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users This email and any attachments to it may be confidential and are intended solely for the use of the individual to whom it is addressed. If you are not the intended recipient of this email, you must neither take any action based upon its contents, nor copy or show it to anyone. Please contact the sender if you believe you have received this email in error. QinetiQ may monitor email traffic data and also the content of email for the purposes of security. QinetiQ Limited (Registered in England & Wales: Company Number: 3796233) Registered office: Cody Technology Park, Ively Road, Farnborough, Hampshire, GU14 0LX http://www.qinetiq.com.