> You might double-check by running with "--mca btl ^openib" to see if that is the source of the warning
The warning appears always, independent of the interconnect, and even when running with "--mca btl ^openib". > Does it only crash when you pause it? Or does it crash while normally running? It is very hard to reproduce without pause. It only crashes 1 out of 5 after half an hour for a run which would take 36 hours. Smaller test cases seem to never crash on their own, but when I pause, even quite small test cases (less than a minute) crash, if I have more than 72 workers.