The default binding option depends on the number of procs - it is bind-to core 
for np=2, and bind-to socket for np > 2. You never said, but should I assume 
you ran 4 ranks? If so, then we should be trying to bind-to socket.

I'm not sure what your cpuset is telling us - are you binding us to a socket? 
Are some cpus in one socket, and some in another?

It could be that the cpuset + bind-to socket is resulting in some odd behavior, 
but I'd need a little more info to narrow it down.


On Jun 18, 2014, at 7:48 PM, Brock Palen <bro...@umich.edu> wrote:

> I have started using 1.8.1 for some codes (meep in this case) and it 
> sometimes works fine, but in a few cases I am seeing ranks being given 
> overlapping CPU assignments, not always though.
> 
> Example job, default binding options (so by-core right?):
> 
> Assigned nodes, the one in question is nyx5398, we use torque CPU sets, and 
> use TM to spawn.
> 
> [nyx5406:2][nyx5427:2][nyx5506:2][nyx5311:3]
> [nyx5329:4][nyx5398:4][nyx5396:11][nyx5397:11]
> [nyx5409:11][nyx5411:11][nyx5412:3]
> 
> [root@nyx5398 ~]# hwloc-bind --get --pid 16065
> 0x00000200
> [root@nyx5398 ~]# hwloc-bind --get --pid 16066
> 0x00000800
> [root@nyx5398 ~]# hwloc-bind --get --pid 16067
> 0x00000200
> [root@nyx5398 ~]# hwloc-bind --get --pid 16068
> 0x00000800
> 
> [root@nyx5398 ~]# cat /dev/cpuset/torque/12703230.nyx.engin.umich.edu/cpus 
> 8-11
> 
> So torque claims the CPU set setup for the job has 4 cores, but as you can 
> see the ranks were giving identical binding. 
> 
> I checked the pids they were part of the correct CPU set, I also checked, 
> orted:
> 
> [root@nyx5398 ~]# hwloc-bind --get --pid 16064
> 0x00000f00
> [root@nyx5398 ~]# hwloc-calc --intersect PU 16064
> ignored unrecognized argument 16064
> 
> [root@nyx5398 ~]# hwloc-calc --intersect PU 0x00000f00
> 8,9,10,11
> 
> Which is exactly what I would expect.
> 
> So ummm, i'm lost why this might happen?  What else should I check?  Like I 
> said not all jobs show this behavior.
> 
> Brock Palen
> www.umich.edu/~brockp
> CAEN Advanced Computing
> XSEDE Campus Champion
> bro...@umich.edu
> (734)936-1985
> 
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/06/24672.php

Reply via email to