Hi, I'm working with MPI_Comm_spawn and I have some error messages.
The code is relatively simple: #include <stdio.h> #include <stdlib.h> #include <string.h> #include <math.h> #include <mpi.h> int main(int argc, char ** argv){ int i; int rank, size, child_rank; char nomehost[20]; MPI_Comm parent, intercomm1, intercomm2; int erro; int level, curr_level; MPI_Init(&argc, &argv); level = atoi(argv[1]); MPI_Comm_get_parent(&parent); if(parent == MPI_COMM_NULL){ rank=0; } else{ MPI_Recv(&rank, 1, MPI_INT, 0, 0, parent, MPI_STATUS_IGNORE); } curr_level = (int) log2(rank+1); printf(" --> rank: %d and curr_level: %d\n", rank, curr_level); // Node propagation if(curr_level < level){ // 2^(curr_level+1) - 1 + 2*(rank - 2^curr_level - 1) = 2*rank + 1 child_rank = 2*rank + 1; printf("(%d) Before create rank %d\n", rank, child_rank); MPI_Comm_spawn(argv[0], &argv[1], 1, MPI_INFO_NULL, 0, MPI_COMM_SELF, &intercomm1, &erro); printf("(%d) After create rank %d\n", rank, child_rank); MPI_Send(&child_rank, 1, MPI_INT, 0, 0, intercomm1); //sleep(1); child_rank = child_rank + 1; printf("(%d) Before create rank %d\n", rank, child_rank); MPI_Comm_spawn(argv[0], &argv[1], 1, MPI_INFO_NULL, 0, MPI_COMM_SELF, &intercomm2, &erro); printf("(%d) After create rank %d\n", rank, child_rank); MPI_Send(&child_rank, 1, MPI_INT, 0, 0, intercomm2); } gethostname(nomehost, 20); printf("(%d) in %s\n", rank, nomehost); MPI_Finalize(); return(0); } The program will create a binary tree of process until get a specific level determined by the variable "level". If the level is 2, the tree will be: (0) / \ (1) (2) / \ / \ (3) (4) (5) (6) Error messages are (when a use 1 host): Compiling: mpicc test.c -o test -lm Running: mpirun -np 1 ./test 3 --> rank: 0 and curr_level: 0 (0) Before create rank 1 (0) After create rank 1 (0) Before create rank 2 --> rank: 1 and curr_level: 1 (1) Before create rank 3 [cacau.ic.uff.br:17892] [[31928,0],0] ORTE_ERROR_LOG: Not found in file base/plm_base_launch_support.c at line 75 When I use 2 hosts, error is worst. The code is similar to the writing here (I have to set hosts before spawn by MPI_Info_set). Using MPILAM, program runs normally. I think something wrong occurs when I try to use 2 MPI_Comm_spawn consecutively and children processes spawn another processes too. Seems to be a race condition because the error does not always happen (when the level is 2, for example). Using 3 levels or more, error is recurrent. Similar error has been previously posted in another thread: http://www.open-mpi.org/community/lists/users/2009/12/11601.php However, I used the stable version 1.4.4 and this problem still happens. Developers think of to fix it? Thanks, Fernanda