> You might double-check by running with "--mca btl ^openib" to see if that
is the source of the warning

The warning appears always, independent of the interconnect, and even when
running with "--mca btl ^openib".


> Does it only crash when you pause it? Or does it crash while normally
running?

It is very hard to reproduce without pause. It only crashes 1 out of 5
after half an hour for a run which would take 36 hours. Smaller test cases
seem to never crash on their own, but when I pause, even quite small test
cases (less than a minute) crash, if I have more than 72 workers.

Reply via email to