QLogic IBA 7220 Which is interesting in itself, the IB hasn't worked properly since the cluster was delivered.
Martin Rushton HPC System Manager, Weapons Technologies Tel: 01959 514777, Mobile: 07939 219057 email: jmrush...@qinetiq.com www.QinetiQ.com QinetiQ - Delivering customer-focused solutions Please consider the environment before printing this email. -----Original Message----- From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf Of Jeff Squyres Sent: 14 April 2011 16:41 To: Open MPI Users Subject: Re: [OMPI users] shm unlinking They could be from OMPI -- are you using QLogic IB NICs? That's the only thing named "PSM" in Open MPI. On Apr 14, 2011, at 9:46 AM, Rushton Martin wrote: > A typical file is called > /dev/shm/psm_shm.41e04667-f3ba-e503-8464-db6c209b3430 > > I had assumed that these were from OMPI, but clearly I could be wrong. > They vary in size, but are typically 42MiB, only 0.2% of our small > diskless nodes' memory, but put a dozen in there and they start to be > noticed. lsof shows all the processes in a particular job have the > same one open, the other files are associated chronologically with > failed jobs. > > HTH > > Martin Rushton > HPC System Manager, Weapons Technologies > Tel: 01959 514777, Mobile: 07939 219057 > email: jmrush...@qinetiq.com > www.QinetiQ.com > QinetiQ - Delivering customer-focused solutions > > Please consider the environment before printing this email. > -----Original Message----- > From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] > On Behalf Of Jeff Squyres > Sent: 14 April 2011 14:33 > To: Open MPI Users > Subject: Re: [OMPI users] shm unlinking > > On Apr 14, 2011, at 9:22 AM, Rushton Martin wrote: > >> For your information: we were supplied with a script when we bought >> the cluster, but the original script made the assumption that all >> processes and shm files belonging to a specific user ought to be >> deleted. This is a problem if users submit jobs which only half fill >> a node and the second job starts on the same node as the first one. >> The first job to finish causes the continuing job to stop dead. We >> therefore had to disable any cleanup to allow jobs to run. Now we >> are > >> finding a slow fill up with the shm files and I need to do something; >> at least now I have a way forward. > > Note that Open MPI v1.4.x is likely using mmap files by default -- > these should be under /tmp/ somewhere. If they get left around, they > can cause shared memory to be filled up, but they should also be > unrelated in /dev/shm kinds of things. If you're seeing /dev/shm fill > up, that might be due to something else. > > Also, I'm a little confused by your reference to psm_shm... are you > talking about the QLogic PSM device? If that does some tomfoolery > with /dev/shm somewhere, I'm unaware of it (i.e., I don't know > much/anything about what that device does internally). > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > This email and any attachments to it may be confidential and are > intended solely for the use of the individual to whom it is addressed. > If you are not the intended recipient of this email, you must neither > take any action based upon its contents, nor copy or show it to > anyone. Please contact the sender if you believe you have received > this email in error. QinetiQ may monitor email traffic data and also > the content of email for the purposes of security. QinetiQ Limited > (Registered in England & Wales: Company Number: 3796233) Registered > office: Cody Technology Park, Ively Road, Farnborough, Hampshire, GU14 > 0LX http://www.qinetiq.com. > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/ _______________________________________________ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users The QinetiQ e-mail privacy policy and company information is detailed elsewhere in the body of this email.