Looks to me like it is a race condition, and the timing between 1.3.3 and 1.4 is just enough to trip it. I can break the race, but it will have to be in a future fix release.
Meantime, your best bet is to either stick with 1.3.3 or add the delay. On Dec 15, 2009, at 5:51 AM, Marcia Cristina Cera wrote: > Hi, > > I intend to develop an application using the MPI_Comm_spawn to create > dynamically new MPI tasks (or processes). > The structure of the program is like a tree: each node creates 2 new ones > until reaches a predefined number of levels. > > I developed a small program to explain my problem as can be seen in > attachment. > -- start.c: launches (through MPI_Comm_spawn, in which the argv has the level > value) the root of the tree (a ch_rec program). Afterward spawn, a message is > sent to child and the process block in an MPI_Recv. > -- ch_rec.c: gets its level value and receives the parent message, then if > its level is less than a predefined limit, it will creates 2 children: > - set the level value; > - spawn 1 child; > - send a message; > - call an MPI_Irecv; > - repeat the 4 previous steps for the second child; > - call an MPI_Waitany waiting for children returns. > When children messages are received, the process send a message to its parent > and call MPI_Finalize. > > Using the openmpi-1.3.3 version the program runs as expected but with > openmpi-1.4 I get the following error: > > $ mpirun -np 1 start > level 0 > level = 1 > Parent sent: level 0 (pid:4279) > level = 2 > Parent sent: level 1 (pid:4281) > [xiru-8.portoalegre.grenoble.grid5000.fr:04278] [[42824,0],0] ORTE_ERROR_LOG: > Not found in file base/plm_base_launch_support.c at line 758 > > The error happens when my program try to launch the second child immediately > after the first spawn call. > In my tests I try to put an sleep of 2 second between the first and the > second spawn, and then the program runs as expected. > > Some one can help me with this version 1.4 bug? > > thanks, > márcia. > > <spawn-problem.tar.gz>_______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users