On Mar 7, 2011, at 3:24 AM, Federico Golfrè Andreasi wrote:

> Hi Ralph,
> 
> thank you very much for the detailed response.
> 
> I have to apologize I was not clear: I would like to use the 
> MPI_spawn_multiple function.

Shouldn't matter - it's the same code path.

> (I've attached the example program I use) .

I'm rebuilding for C++ as I don't typically use that language - will report 
back later.

> 
> In any case I tryed your test program, just compling it with:
> /home/fandreasi/openmpi-1.7/bin/mpicc loop_spawn.c -o loop_spawn
> /home/fandreasi/openmpi-1.7/bin/mpicc loop_child.c -o loop_child
> and execute it on a single machine with
> /home/fandreasi/openmpi-1.7/bin/mpiexec ./loop_spawn ./loop_child

I should have been clearer - this is not the correct way to run the program. 
The correct way is:

mpiexec -n 1 ./loop_spawn

loop_child is just the executable being comm_spawn'd.

> but it hungs at different loop iterations after printing:
> "Child 26833:exiting"
> but looking at the top both the process (loop_spawn and loop_child) are still 
> alive.
> 
> I'm starting thinking that I've some environment setting not correct or I 
> need to compile OpenMPI with some options.
> I compile it just setting the --prefix option to the ./configure.
> Do I need to do something else ?

No, that should work.

> 
> I have a linux Centos 4, 64 bits machine,
> with gcc 3.4.
> 
> I think that this is my main problem now.
> 
> 
> 
> Just to answer to other topics (minor):
> - Regardin version mismatch I use a linux cluster where the /home/ directory 
> is shared among the compute nodes,
> and I've edited by .bashrc and .bashprofile to export the correct 
> LD_LIBRARY_PATH.
> - thank you for the usefull trick about svn.

No idea, then - all that error says is that the receiving code and the sending 
code are mismatched.

> 
> 
> Thank you very much !!!
> Federico.
> 
> 
> 
> 
> 
> 
> Il giorno 05 marzo 2011 19:05, Ralph Castain <r...@open-mpi.org> ha scritto:
> Hi Federico
> 
> I tested the trunk today and it works fine for me - I let it spin for 1000 
> cycles without issue. My test program is essentially identical to what you 
> describe - you can see it in the orte/test/mpi directory. The "master" is 
> loop_spawn.c, and the "slave" is loop_child.c. I only tested it on a single 
> machine, though - will have to test multi-machine later. You might see if 
> that makes a difference.
> 
> The error you report in your attachment is a classic symptom of mismatched 
> versions. Remember, we don't forward your ld_lib_path, so it has to be 
> correct on your remote machine.
> 
> As for r22794 - we don't keep anything that old on our web site. If you want 
> to build it, the best way to get the code is to do a subversion checkout of 
> the developer's trunk at that revision level:
> 
> svn co -r 22794 http://svn.open-mpi.org/svn/ompi/trunk
> 
> Remember to run autogen before configure.
> 
> 
> On Mar 4, 2011, at 4:43 AM, Federico Golfrè Andreasi wrote:
> 
>> 
>> Hi Ralph,
>> 
>> I'm getting stuck with spawning stuff,
>> 
>> I've downloaded the snapshot from the trunk of 1st of March 
>> (openmpi-1.7a1r24472.tar.bz2),
>> I'm testing using a small program that does the following:
>>  - master program starts and each rank prints his hostsname
>>  - master program spawn a slave program with the same size
>>  - each rank of the slave (spawned) program prints his hostname
>>  - end
>> Not always he is able to complete the progam run, two different behaviour:
>>  1. not all the slave print their hostname and the program ends suddenly
>>  2. both program ends correctly but orted demon is still alive and I need to 
>> press crtl-c to exit
>> 
>> 
>> I've tryed to recompile my test program with a previous snapshot 
>> (openmpi-1.7a1r22794.tar.bz2)
>> where I have only the compiled version of OpenMPI (in another machine).
>> It gives me an error before starting (I've attacehd)
>> Surfing on the FAQ I found some tip and I verified to compile the program 
>> with the correct OpenMPI version,
>> that the LD_LIBRARY_PATH is consistent.
>> So I would like to re-compile the openmpi-1.7a1r22794.tar.bz2 but where can 
>> I found it ?
>> 
>> 
>> Thank you,
>> Federico
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> Il giorno 23 febbraio 2011 03:43, Ralph Castain <rhc.open...@gmail.com> ha 
>> scritto:
>> Apparently not. I will investigate when I return from vacation next week.
>> 
>> 
>> Sent from my iPad
>> 
>> On Feb 22, 2011, at 12:42 AM, Federico Golfrè Andreasi 
>> <federico.gol...@gmail.com> wrote:
>> 
>>> Hi Ralf,
>>> 
>>> I've tested spawning with the OpenMPI 1.5 release but that fix is not there.
>>> Are you sure you've added it ?
>>> 
>>> Thank you,
>>> Federico
>>> 
>>> 
>>> 
>>> 2010/10/19 Ralph Castain <r...@open-mpi.org>
>>> The fix should be there - just didn't get mentioned.
>>> 
>>> Let me know if it isn't and I'll ensure it is in the next one...but I'd be 
>>> very surprised if it isn't already in there.
>>> 
>>> 
>>> On Oct 19, 2010, at 3:03 AM, Federico Golfrè Andreasi wrote:
>>> 
>>>> Hi Ralf !
>>>> 
>>>> I saw that the new realease 1.5 is out. 
>>>> I didn't found this fix in the "list of changes", is it present but not 
>>>> mentioned since is a minor fix ?
>>>> 
>>>> Thank you,
>>>> Federico
>>>> 
>>>> 
>>>> 
>>>> 2010/4/1 Ralph Castain <r...@open-mpi.org>
>>>> Hi there!
>>>> 
>>>> It will be in the 1.5.0 release, but not 1.4.2 (couldn't backport the 
>>>> fix). I understand that will come out sometime soon, but no firm date has 
>>>> been set.
>>>> 
>>>> 
>>>> On Apr 1, 2010, at 4:05 AM, Federico Golfrè Andreasi wrote:
>>>> 
>>>>> Hi Ralph,
>>>>> 
>>>>> 
>>>>>          I've downloaded and tested the openmpi-1.7a1r22817 snapshot,
>>>>> and it works fine for (multiple) spawning more than 128 processes.
>>>>> 
>>>>> That fix will be included in the next release of OpenMPI, right ?
>>>>> Do you when it will be released ? Or where I can find that info ?
>>>>> 
>>>>> Thank you,
>>>>>      Federico
>>>>> 
>>>>> 
>>>>> 
>>>>> 2010/3/1 Ralph Castain <r...@open-mpi.org>
>>>>> http://www.open-mpi.org/nightly/trunk/
>>>>> 
>>>>> I'm not sure this patch will solve your problem, but it is worth a try.
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> 
>>>> 
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> 
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> 
>>> 
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> 
>> 
>> <OpenMPI.error>
> 
> 
> <master.cpp><slave.cpp>

Reply via email to