> On Apr 29, 2015, at 4:09 PM, Brice Goglin <brice.gog...@inria.fr> wrote: > > Nothing wrong in that XML. I don't see what could be happening besides a > node rebooting with hyper-threading enabled for random reasons. > Please run "lstopo foo.xml" again on the node next time you get the OMPI > failure (assuming you get a chance to log on the node before it reboots > etc).
Thanks. Do you understand why OpenMPI would even try to bind core #16? I’m pretty sure it was a 16 task job on a 16 (physical) core machine - shouldn’t it try to bind 0-15 only? Noam
smime.p7s
Description: S/MIME cryptographic signature