I'm not entirely clear on the sequence of commands here. Is the user requesting a new allocation from maui/torque for each run? In this case, it's possible we aren't correctly picking up the external binding from Torque. This would likely be a bug we would need to fix.
Or is the user obtaining a single allocation of the entire node, and then using mpirun to start multiple jobs in parallel? In this case, the issue is that the user needs to tell mpirun which cpus to confine itself to or else it will always assume that all cpus belong to it. This will lead to overloading the lower core numbers. The problem here can be resolved by adding --cpuset 0,1,2 (or whatever pattern you like) to each cmd line. You might also consider updating to 1.8.4 as we did fix some integration bugs. I don't recall something specific to this question, but it could be my memory at fault. Ralph On Tue, Jan 27, 2015 at 11:39 PM, DOHERTY, Greg <g...@ansto.gov.au> wrote: > This might or might not be related to openmpi 1.8.1. I have not seen the > problem with the same program and previous versions of openmpi > > We have 64 core AMD nodes. I have recently recompiled a large Monte Carlo > program using 1.8.1 version of openmpi. Users start this program using > maui/torque asking for a number of cores, usually on only one node. One run > of the program asking for any number of cores up to 64 runs with full cpu > utilisation on each core. A user might start a run asking for 16 cores – > fine. Then he starts a second run on the same node, asking for another 16 > cores. Immediately the cpu utilisation on all cores of the first job drops > to 50%, as it is for the newly starting job. If a different program were > using the remaining 32 cores on the same node at the same time, the cpu > utilisation of its cores is unaffected. If we qdel the second 16 core job, > the cpu utilisation of each core of the first job immediately climbs back > to 100%. Any suggestions please, on where I might start looking for the > solution to this problem? > > Greg Doherty > > ANSTO > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/01/26239.php >