Hi, Before starting programs on my cluster, I want to check on every CPU if it is up and able to run MPI applications.
For this, I use a kind of 'ping' program that just send a message saying 'I'm OK' tu a superviser program. The 'ping' program is sent by the superviser on each CPU by the MPI_Comm_spawn_multiple command. It works fine when every CPU is up, but when one is down, my superviser stops when calling the MPI_Comm_spawn_multiple command. So the questions are : * 'What am I doing wrong ?' * 'Is there a other way to check my CPUs ?' Thanks for your help. Laurent.