Hello all. Im started some test with Openmpi 4.0.1. I have two machines, one local, the other remote. I have used ssh connection. Some basic test (hello.c script) runs ok local and remote with mpirun. But I need to run a script without mpirun and generate with spawn some processes. Here some examples that what I get.
My hostfile: cat hostfile localhost slots=4 slave1 slots=4 If I set this: MPI_Info_set( info, "add-hostfile", "hostfile" ); MPI_Info_set( info, "npernode", "3" ); And I run 6 processes (i.e. MPI_Comm_spawn() receives 6 procceses to run): ./dyamic.o Its Runs Ok: 4 procceses local and 3 remote Now, If I set (without add-hostfile and npernode): MPI_Info_set( info, "add-host", "slave1,slave1,slave1,slave1" ); And I run 4 processes... its hangs, but I can see with Top one running processes on local and 4 on remote (slave1), that I think Its ok however. After a while It throws this: “A request has timed out and will therefore fail: Operation: LOOKUP: orted/pmix/pmix_server_pub.c:345 Your job may terminate as a result of this problem. You may want to adjust the MCA parameter pmix_server_max_wait and try again. If this occurred during a connect/accept operation, you can adjust that time using the pmix_base_exchange_timeout parameter. -------------------------------------------------------------------------- [master:22881] *** An error occurred in MPI_Comm_spawn [master:22881] *** reported by process [63766529,0] [master:22881] *** on communicator MPI_COMM_WORLD [master:22881] *** MPI_ERR_UNKNOWN: unknown error [master:22881] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, [master:22881] *** and potentially your MPI job)” I watch with Top now and there are not any processes running. I really need this type of allocation. Any help It will be very, very appreciated. Thanks in advance.