QLogic IBA 7220

Which is interesting in itself, the IB hasn't worked properly since the
cluster was delivered. 


Martin Rushton
HPC System Manager, Weapons Technologies
Tel: 01959 514777, Mobile: 07939 219057
email: jmrush...@qinetiq.com
www.QinetiQ.com
QinetiQ - Delivering customer-focused solutions

Please consider the environment before printing this email.
-----Original Message-----
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On
Behalf Of Jeff Squyres
Sent: 14 April 2011 16:41
To: Open MPI Users
Subject: Re: [OMPI users] shm unlinking

They could be from OMPI -- are you using QLogic IB NICs?  That's the
only thing named "PSM" in Open MPI.


On Apr 14, 2011, at 9:46 AM, Rushton Martin wrote:

> A typical file is called
> /dev/shm/psm_shm.41e04667-f3ba-e503-8464-db6c209b3430
> 
> I had assumed that these were from OMPI, but clearly I could be wrong.
> They vary in size, but are typically 42MiB, only 0.2% of our small 
> diskless nodes' memory, but put a dozen in there and they start to be 
> noticed.  lsof shows all the processes in a particular job have the 
> same one open, the other files are associated chronologically with 
> failed jobs.
> 
> HTH
> 
> Martin Rushton
> HPC System Manager, Weapons Technologies
> Tel: 01959 514777, Mobile: 07939 219057
> email: jmrush...@qinetiq.com
> www.QinetiQ.com
> QinetiQ - Delivering customer-focused solutions
> 
> Please consider the environment before printing this email.
> -----Original Message-----
> From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] 
> On Behalf Of Jeff Squyres
> Sent: 14 April 2011 14:33
> To: Open MPI Users
> Subject: Re: [OMPI users] shm unlinking
> 
> On Apr 14, 2011, at 9:22 AM, Rushton Martin wrote:
> 
>> For your information: we were supplied with a script when we bought 
>> the cluster, but the original script made the assumption that all 
>> processes and shm files belonging to a specific user ought to be 
>> deleted.  This is a problem if users submit jobs which only half fill

>> a node and the second job starts on the same node as the first one.
>> The first job to finish causes the continuing job to stop dead.  We 
>> therefore had to disable any cleanup to allow jobs to run.  Now we 
>> are
> 
>> finding a slow fill up with the shm files and I need to do something;

>> at least now I have a way forward.
> 
> Note that Open MPI v1.4.x is likely using mmap files by default -- 
> these should be under /tmp/ somewhere.  If they get left around, they 
> can cause shared memory to be filled up, but they should also be 
> unrelated in /dev/shm kinds of things.  If you're seeing /dev/shm fill

> up, that might be due to something else.
> 
> Also, I'm a little confused by your reference to psm_shm... are you 
> talking about the QLogic PSM device?  If that does some tomfoolery 
> with /dev/shm somewhere, I'm unaware of it (i.e., I don't know 
> much/anything about what that device does internally).
> 
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> This email and any attachments to it may be confidential and are 
> intended solely for the use of the individual to whom it is addressed.

> If you are not the intended recipient of this email, you must neither 
> take any action based upon its contents, nor copy or show it to 
> anyone. Please contact the sender if you believe you have received 
> this email in error. QinetiQ may monitor email traffic data and also 
> the content of email for the purposes of security. QinetiQ Limited 
> (Registered in England & Wales: Company Number: 3796233) Registered 
> office: Cody Technology Park, Ively Road, Farnborough, Hampshire, GU14

> 0LX  http://www.qinetiq.com.
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
The QinetiQ e-mail privacy policy and company information is detailed elsewhere 
in the body of this email.

Reply via email to