The default binding option depends on the number of procs - it is bind-to core for np=2, and bind-to socket for np > 2. You never said, but should I assume you ran 4 ranks? If so, then we should be trying to bind-to socket.
I'm not sure what your cpuset is telling us - are you binding us to a socket? Are some cpus in one socket, and some in another? It could be that the cpuset + bind-to socket is resulting in some odd behavior, but I'd need a little more info to narrow it down. On Jun 18, 2014, at 7:48 PM, Brock Palen <bro...@umich.edu> wrote: > I have started using 1.8.1 for some codes (meep in this case) and it > sometimes works fine, but in a few cases I am seeing ranks being given > overlapping CPU assignments, not always though. > > Example job, default binding options (so by-core right?): > > Assigned nodes, the one in question is nyx5398, we use torque CPU sets, and > use TM to spawn. > > [nyx5406:2][nyx5427:2][nyx5506:2][nyx5311:3] > [nyx5329:4][nyx5398:4][nyx5396:11][nyx5397:11] > [nyx5409:11][nyx5411:11][nyx5412:3] > > [root@nyx5398 ~]# hwloc-bind --get --pid 16065 > 0x00000200 > [root@nyx5398 ~]# hwloc-bind --get --pid 16066 > 0x00000800 > [root@nyx5398 ~]# hwloc-bind --get --pid 16067 > 0x00000200 > [root@nyx5398 ~]# hwloc-bind --get --pid 16068 > 0x00000800 > > [root@nyx5398 ~]# cat /dev/cpuset/torque/12703230.nyx.engin.umich.edu/cpus > 8-11 > > So torque claims the CPU set setup for the job has 4 cores, but as you can > see the ranks were giving identical binding. > > I checked the pids they were part of the correct CPU set, I also checked, > orted: > > [root@nyx5398 ~]# hwloc-bind --get --pid 16064 > 0x00000f00 > [root@nyx5398 ~]# hwloc-calc --intersect PU 16064 > ignored unrecognized argument 16064 > > [root@nyx5398 ~]# hwloc-calc --intersect PU 0x00000f00 > 8,9,10,11 > > Which is exactly what I would expect. > > So ummm, i'm lost why this might happen? What else should I check? Like I > said not all jobs show this behavior. > > Brock Palen > www.umich.edu/~brockp > CAEN Advanced Computing > XSEDE Campus Champion > bro...@umich.edu > (734)936-1985 > > > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/06/24672.php