On Mar 7, 2011, at 3:24 AM, Federico Golfrè Andreasi wrote: > Hi Ralph, > > thank you very much for the detailed response. > > I have to apologize I was not clear: I would like to use the > MPI_spawn_multiple function.
Shouldn't matter - it's the same code path. > (I've attached the example program I use) . I'm rebuilding for C++ as I don't typically use that language - will report back later. > > In any case I tryed your test program, just compling it with: > /home/fandreasi/openmpi-1.7/bin/mpicc loop_spawn.c -o loop_spawn > /home/fandreasi/openmpi-1.7/bin/mpicc loop_child.c -o loop_child > and execute it on a single machine with > /home/fandreasi/openmpi-1.7/bin/mpiexec ./loop_spawn ./loop_child I should have been clearer - this is not the correct way to run the program. The correct way is: mpiexec -n 1 ./loop_spawn loop_child is just the executable being comm_spawn'd. > but it hungs at different loop iterations after printing: > "Child 26833:exiting" > but looking at the top both the process (loop_spawn and loop_child) are still > alive. > > I'm starting thinking that I've some environment setting not correct or I > need to compile OpenMPI with some options. > I compile it just setting the --prefix option to the ./configure. > Do I need to do something else ? No, that should work. > > I have a linux Centos 4, 64 bits machine, > with gcc 3.4. > > I think that this is my main problem now. > > > > Just to answer to other topics (minor): > - Regardin version mismatch I use a linux cluster where the /home/ directory > is shared among the compute nodes, > and I've edited by .bashrc and .bashprofile to export the correct > LD_LIBRARY_PATH. > - thank you for the usefull trick about svn. No idea, then - all that error says is that the receiving code and the sending code are mismatched. > > > Thank you very much !!! > Federico. > > > > > > > Il giorno 05 marzo 2011 19:05, Ralph Castain <r...@open-mpi.org> ha scritto: > Hi Federico > > I tested the trunk today and it works fine for me - I let it spin for 1000 > cycles without issue. My test program is essentially identical to what you > describe - you can see it in the orte/test/mpi directory. The "master" is > loop_spawn.c, and the "slave" is loop_child.c. I only tested it on a single > machine, though - will have to test multi-machine later. You might see if > that makes a difference. > > The error you report in your attachment is a classic symptom of mismatched > versions. Remember, we don't forward your ld_lib_path, so it has to be > correct on your remote machine. > > As for r22794 - we don't keep anything that old on our web site. If you want > to build it, the best way to get the code is to do a subversion checkout of > the developer's trunk at that revision level: > > svn co -r 22794 http://svn.open-mpi.org/svn/ompi/trunk > > Remember to run autogen before configure. > > > On Mar 4, 2011, at 4:43 AM, Federico Golfrè Andreasi wrote: > >> >> Hi Ralph, >> >> I'm getting stuck with spawning stuff, >> >> I've downloaded the snapshot from the trunk of 1st of March >> (openmpi-1.7a1r24472.tar.bz2), >> I'm testing using a small program that does the following: >> - master program starts and each rank prints his hostsname >> - master program spawn a slave program with the same size >> - each rank of the slave (spawned) program prints his hostname >> - end >> Not always he is able to complete the progam run, two different behaviour: >> 1. not all the slave print their hostname and the program ends suddenly >> 2. both program ends correctly but orted demon is still alive and I need to >> press crtl-c to exit >> >> >> I've tryed to recompile my test program with a previous snapshot >> (openmpi-1.7a1r22794.tar.bz2) >> where I have only the compiled version of OpenMPI (in another machine). >> It gives me an error before starting (I've attacehd) >> Surfing on the FAQ I found some tip and I verified to compile the program >> with the correct OpenMPI version, >> that the LD_LIBRARY_PATH is consistent. >> So I would like to re-compile the openmpi-1.7a1r22794.tar.bz2 but where can >> I found it ? >> >> >> Thank you, >> Federico >> >> >> >> >> >> >> >> >> >> >> Il giorno 23 febbraio 2011 03:43, Ralph Castain <rhc.open...@gmail.com> ha >> scritto: >> Apparently not. I will investigate when I return from vacation next week. >> >> >> Sent from my iPad >> >> On Feb 22, 2011, at 12:42 AM, Federico Golfrè Andreasi >> <federico.gol...@gmail.com> wrote: >> >>> Hi Ralf, >>> >>> I've tested spawning with the OpenMPI 1.5 release but that fix is not there. >>> Are you sure you've added it ? >>> >>> Thank you, >>> Federico >>> >>> >>> >>> 2010/10/19 Ralph Castain <r...@open-mpi.org> >>> The fix should be there - just didn't get mentioned. >>> >>> Let me know if it isn't and I'll ensure it is in the next one...but I'd be >>> very surprised if it isn't already in there. >>> >>> >>> On Oct 19, 2010, at 3:03 AM, Federico Golfrè Andreasi wrote: >>> >>>> Hi Ralf ! >>>> >>>> I saw that the new realease 1.5 is out. >>>> I didn't found this fix in the "list of changes", is it present but not >>>> mentioned since is a minor fix ? >>>> >>>> Thank you, >>>> Federico >>>> >>>> >>>> >>>> 2010/4/1 Ralph Castain <r...@open-mpi.org> >>>> Hi there! >>>> >>>> It will be in the 1.5.0 release, but not 1.4.2 (couldn't backport the >>>> fix). I understand that will come out sometime soon, but no firm date has >>>> been set. >>>> >>>> >>>> On Apr 1, 2010, at 4:05 AM, Federico Golfrè Andreasi wrote: >>>> >>>>> Hi Ralph, >>>>> >>>>> >>>>> I've downloaded and tested the openmpi-1.7a1r22817 snapshot, >>>>> and it works fine for (multiple) spawning more than 128 processes. >>>>> >>>>> That fix will be included in the next release of OpenMPI, right ? >>>>> Do you when it will be released ? Or where I can find that info ? >>>>> >>>>> Thank you, >>>>> Federico >>>>> >>>>> >>>>> >>>>> 2010/3/1 Ralph Castain <r...@open-mpi.org> >>>>> http://www.open-mpi.org/nightly/trunk/ >>>>> >>>>> I'm not sure this patch will solve your problem, but it is worth a try. >>>>> >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >> >> <OpenMPI.error> > > > <master.cpp><slave.cpp>