The bulk of our compute nodes are 8 cores (twin 4-core IBM x3550-m2).
Jobs are submitted by Torque/MOAB.  When run with up to np=8 there is
good performance.  Attempting to run with more processors brings
problems, specifically if any one node of a group of nodes has all 8
cores in use the job hangs.  For instance running with 14 cores (7+7) is
fine, but running with 16 (8+8) hangs.

>From the FAQs I note the issues of over committing and aggressive
scheduling.  Is it possible for mpirun (or orted on the remote nodes) to
be blocked from progressing by a fully committed node?  We have a few
x3755-m2 machines with 16 cores, and we have detected a similar issue
with 16+16.

Martin Rushton
HPC System Manager, Weapons Technologies
Tel: 01959 514777, Mobile: 07939 219057
email: jmrush...@qinetiq.com
www.QinetiQ.com
QinetiQ - Delivering customer-focused solutions

Please consider the environment before printing this email.
This email and any attachments to it may be confidential and are
intended solely for the use of the individual to whom it is 
addressed. If you are not the intended recipient of this email,
you must neither take any action based upon its contents, nor 
copy or show it to anyone. Please contact the sender if you 
believe you have received this email in error. QinetiQ may 
monitor email traffic data and also the content of email for 
the purposes of security. QinetiQ Limited (Registered in England
& Wales: Company Number: 3796233) Registered office: Cody Technology 
Park, Ively Road, Farnborough, Hampshire, GU14 0LX  http://www.qinetiq.com.

Reply via email to