Le 31/01/2012 14:24, Jeff Squyres a écrit : > On Jan 31, 2012, at 6:18 AM, Dave Love wrote: > >> Core binding is broken on Interlagos with open-mpi 1.5.4. I guess it >> also bites on Magny-Cours, but all our systems are currently busy and I >> can't check. >> >> It does work, at least basically, in 1.5.5rc1, but the release notes for >> that don't give any indication. Perhaps someone could mention >> Interlagos in the notes, and any other hardware that might be affected >> (presumably Magny-Cours and some Power if it's confusion introduced by >> the extra NUMA level). > I think there was some weirdness in how AMD chips were represented to the > Linux kernel (they present differently than Intel chips). I believe the > issues have been worked out by hwloc.
Right, AMD "dual-core modules" are reported almost exactly as "a single hyperthreaded core" by the kernel. We had to tweak hwloc to detect two different cores. So you get 32 cores and 32 PUs (hwloc >= 1.2.1) instead of 16 cores and 32 PUs (hwloc <1.2.1). If you don't have this hwloc change, I guess binding to core breaks because you have 16 cores for 32 processes. I don't know if there's an easy way to tell OMPI 1.5.4 to bind to PUs instead of Cores. This should work as expected. Unless I am mistaken, OMPI 1.5.4 has hwloc 1.2 while 1.5.5 will have 1.2.2 or even 1.3.1. So don't use core binding on interlagos with OMPI<=1.5.4. Note that magny-Cours processors are OK, cores are "normal" there. FWIW, the Linux kernel (at least up to 3.2) still reports wrong L2 and L1i cache information on AMD Bulldozer. Kernel bug reported at https://bugzilla.kernel.org/show_bug.cgi?id=42607 Brice