On Apr 13, 2011, at 8:13 AM, Rushton Martin wrote:

> The bulk of our compute nodes are 8 cores (twin 4-core IBM x3550-m2).
> Jobs are submitted by Torque/MOAB.  When run with up to np=8 there is
> good performance.  Attempting to run with more processors brings
> problems, specifically if any one node of a group of nodes has all 8
> cores in use the job hangs.  For instance running with 14 cores (7+7) is
> fine, but running with 16 (8+8) hangs.
> 
>> From the FAQs I note the issues of over committing and aggressive
> scheduling.  Is it possible for mpirun (or orted on the remote nodes) to
> be blocked from progressing by a fully committed node?  We have a few
> x3755-m2 machines with 16 cores, and we have detected a similar issue
> with 16+16.

I'm not entirely sure I understand your notation, but we have never seen an 
issue when running with fully loaded nodes (i.e., where the number of MPI procs 
on the node = the number of cores).

What version of OMPI are you using? Are you binding the procs?


> 
> Martin Rushton
> HPC System Manager, Weapons Technologies
> Tel: 01959 514777, Mobile: 07939 219057
> email: jmrush...@qinetiq.com
> www.QinetiQ.com
> QinetiQ - Delivering customer-focused solutions
> 
> Please consider the environment before printing this email.
> This email and any attachments to it may be confidential and are
> intended solely for the use of the individual to whom it is 
> addressed. If you are not the intended recipient of this email,
> you must neither take any action based upon its contents, nor 
> copy or show it to anyone. Please contact the sender if you 
> believe you have received this email in error. QinetiQ may 
> monitor email traffic data and also the content of email for 
> the purposes of security. QinetiQ Limited (Registered in England
> & Wales: Company Number: 3796233) Registered office: Cody Technology 
> Park, Ively Road, Farnborough, Hampshire, GU14 0LX  http://www.qinetiq.com.
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


Reply via email to