Hi, 

Before starting programs on my cluster, I want to check on every CPU if it is 
up and able to run MPI applications.

For this, I use a kind of 'ping' program that just send a message saying 'I'm 
OK' tu a superviser program.
The 'ping' program is sent by the superviser on each CPU by the 
MPI_Comm_spawn_multiple command.

It works fine when every CPU is up, but when one is down, my superviser stops 
when calling the MPI_Comm_spawn_multiple command.

So the questions are : 
* 'What am I doing wrong ?'
* 'Is there a other way to check my CPUs ?'

Thanks for your help.

        Laurent.

Reply via email to