Lloyd Brown <lloyd_br...@byu.edu> writes: > No problem. It wasn't much of a delay. > > The scenario involves a combination of MPI and OpenMP (or other > threading scheme). Basically, the software will launch one or more > processes via MPI, which then spawn threads to do the work. > > What we've been seeing is that, without something like '--bind-to none' > or similar, those threads end up being pinned to the same processor as > the process that spawned them.
The default binding is supposed to be to sockets, as --report-bindings should show. Otherwise see another message I just posted to for an empirical test (and possibly examples in the tutorials referenced -- I don't remember). > We're okay with a bind=none, since we already have cgroups in place to > constrain the user to the resources they request. We might get more > process/thread migration between processors (but within the cgroup) than > we would like, but that's still probably acceptable in this scenario. > > If there's a better solution, we'd love to hear it. --cpus-per-proc, or whatever the non-deprecated version is in mpirun(1). [You needed --loadbalance in OMPI 1.6 to make that work.] You might also like to supply environment variables to get the OpenMP runtime to DTRT for thread affinity, if it doesn't; there isn't an OMPI mechanism for that but you can do it with a wrapper or simple LD_PRELOAD library.