Le 29/04/2015 18:55, Noam Bernstein a écrit :
>> On Apr 29, 2015, at 12:47 PM, Brice Goglin <brice.gog...@inria.fr> wrote:
>>
>> Thanks. It's indeed normal that OMPI failed to bind to cpuset 0,16 since
>> 16 doesn't exist at all.
>> Can you run "lstopo foo.xml" on one node where it failed, and send the
>> foo.xml that got generated? Just want to make sure we don't have invalid
>> cpusets in there.
> It’s attached. Thanks for the help, by the way.
>

Nothing wrong in that XML. I don't see what could be happening besides a
node rebooting with hyper-threading enabled for random reasons.
Please run "lstopo foo.xml" again on the node next time you get the OMPI
failure (assuming you get a chance to log on the node before it reboots
etc).

Brice

Reply via email to