> On Apr 29, 2015, at 4:09 PM, Brice Goglin <[email protected]> wrote: > > Nothing wrong in that XML. I don't see what could be happening besides a > node rebooting with hyper-threading enabled for random reasons. > Please run "lstopo foo.xml" again on the node next time you get the OMPI > failure (assuming you get a chance to log on the node before it reboots > etc).
Thanks. Do you understand why OpenMPI would even try to bind core #16? I’m
pretty sure it was a 16 task job on a 16 (physical) core machine - shouldn’t it
try to bind 0-15 only?
Noam
smime.p7s
Description: S/MIME cryptographic signature
