Hi Ralph, I am afraid I have been a little hasty! I remake my tests with more care and I got the same error also with the 1.3.3 :-/ but in such version the error happens after some successful executions... because of that I did not realize before! Furthermore, I increased the number of levels of the tree (that means have more concurrently dynamic process creations in the lower levels) and I never arrive to execute without error, unless I add the delay. Perhaps the problem might even be a race condition :(
I test with LAM/MPI 7.1.4 and in a first moment it works fine. I have work with LAM for years, but I migrate o OpenMP last year once LAM will be discontinued... I think that I can continue the development of my application adding the delay, while I wait for a release... and I leave the performance tests to be made in the future :) Thank you again Ralph, márcia. On Wed, Dec 16, 2009 at 2:17 AM, Ralph Castain <r...@open-mpi.org> wrote: > Okay, I can replicate this. > > FWIW: your test program works fine with the OMPI trunk and 1.3.3. It only > has a problem with 1.4. Since I can replicate it on multiple machines every > single time, I don't think it is actually a race condition. > > I think someone made a change to the 1.4 branch that created a failure mode > :-/ > > Will have to get back to you on this - may take awhile, and won't be in the > 1.4.1 release. > > Thanks for the replicator! > > On Dec 15, 2009, at 8:35 AM, Marcia Cristina Cera wrote: > > Thank you, Ralph > > I will use the 1.3.3 for now... > while waiting for a future fix release that break this race condiction. > > márcia > > On Tue, Dec 15, 2009 at 12:58 PM, Ralph Castain <r...@open-mpi.org> wrote: > >> Looks to me like it is a race condition, and the timing between 1.3.3 and >> 1.4 is just enough to trip it. I can break the race, but it will have to be >> in a future fix release. >> >> Meantime, your best bet is to either stick with 1.3.3 or add the delay. >> >> On Dec 15, 2009, at 5:51 AM, Marcia Cristina Cera wrote: >> >> Hi, >> >> I intend to develop an application using the MPI_Comm_spawn to create >> dynamically new MPI tasks (or processes). >> The structure of the program is like a tree: each node creates 2 new ones >> until reaches a predefined number of levels. >> >> I developed a small program to explain my problem as can be seen in >> attachment. >> -- start.c: launches (through MPI_Comm_spawn, in which the argv has the >> level value) the root of the tree (a ch_rec program). Afterward spawn, a >> message is sent to child and the process block in an MPI_Recv. >> -- ch_rec.c: gets its level value and receives the parent message, then if >> its level is less than a predefined limit, it will creates 2 children: >> - set the level value; >> - spawn 1 child; >> - send a message; >> - call an MPI_Irecv; >> - repeat the 4 previous steps for the second child; >> - call an MPI_Waitany waiting for children returns. >> When children messages are received, the process send a message to its >> parent and call MPI_Finalize. >> >> Using the openmpi-1.3.3 version the program runs as expected but with >> openmpi-1.4 I get the following error: >> >> $ mpirun -np 1 start >> level 0 >> level = 1 >> Parent sent: level 0 (pid:4279) >> level = 2 >> Parent sent: level 1 (pid:4281) >> [xiru-8.portoalegre.grenoble.grid5000.fr:04278] [[42824,0],0] >> ORTE_ERROR_LOG: Not found in file base/plm_base_launch_support.c at line 758 >> >> The error happens when my program try to launch the second child >> immediately after the first spawn call. >> In my tests I try to put an sleep of 2 second between the first and the >> second spawn, and then the program runs as expected. >> >> Some one can help me with this version 1.4 bug? >> >> thanks, >> márcia. >> >> <spawn-problem.tar.gz>_______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >