On Apr 13, 2011, at 8:13 AM, Rushton Martin wrote: > The bulk of our compute nodes are 8 cores (twin 4-core IBM x3550-m2). > Jobs are submitted by Torque/MOAB. When run with up to np=8 there is > good performance. Attempting to run with more processors brings > problems, specifically if any one node of a group of nodes has all 8 > cores in use the job hangs. For instance running with 14 cores (7+7) is > fine, but running with 16 (8+8) hangs. > >> From the FAQs I note the issues of over committing and aggressive > scheduling. Is it possible for mpirun (or orted on the remote nodes) to > be blocked from progressing by a fully committed node? We have a few > x3755-m2 machines with 16 cores, and we have detected a similar issue > with 16+16.
I'm not entirely sure I understand your notation, but we have never seen an issue when running with fully loaded nodes (i.e., where the number of MPI procs on the node = the number of cores). What version of OMPI are you using? Are you binding the procs? > > Martin Rushton > HPC System Manager, Weapons Technologies > Tel: 01959 514777, Mobile: 07939 219057 > email: jmrush...@qinetiq.com > www.QinetiQ.com > QinetiQ - Delivering customer-focused solutions > > Please consider the environment before printing this email. > This email and any attachments to it may be confidential and are > intended solely for the use of the individual to whom it is > addressed. If you are not the intended recipient of this email, > you must neither take any action based upon its contents, nor > copy or show it to anyone. Please contact the sender if you > believe you have received this email in error. QinetiQ may > monitor email traffic data and also the content of email for > the purposes of security. QinetiQ Limited (Registered in England > & Wales: Company Number: 3796233) Registered office: Cody Technology > Park, Ively Road, Farnborough, Hampshire, GU14 0LX http://www.qinetiq.com. > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users