Hi Ralph,

thank you very much for the detailed response.

I have to apologize I was not clear: I would like to use the
MPI_spawn_multiple function.
(I've attached the example program I use) .

In any case I tryed your test program, just compling it with:
/home/fandreasi/openmpi-1.7/bin/mpicc loop_spawn.c -o loop_spawn
/home/fandreasi/openmpi-1.7/bin/mpicc loop_child.c -o loop_child
and execute it on a single machine with
/home/fandreasi/openmpi-1.7/bin/mpiexec ./loop_spawn ./loop_child
but it hungs at different loop iterations after printing:
"Child 26833:exiting"
but looking at the top both the process (loop_spawn and loop_child) are
still alive.

I'm starting thinking that I've some environment setting not correct or I
need to compile OpenMPI with some options.
I compile it just setting the --prefix option to the ./configure.
Do I need to do something else ?

I have a linux Centos 4, 64 bits machine,
with gcc 3.4.

I think that this is my main problem now.



Just to answer to other topics (minor):
- Regardin version mismatch I use a linux cluster where the /home/ directory
is shared among the compute nodes,
and I've edited by .bashrc and .bashprofile to export the correct
LD_LIBRARY_PATH.
- thank you for the usefull trick about svn.


Thank you very much !!!
Federico.






Il giorno 05 marzo 2011 19:05, Ralph Castain <r...@open-mpi.org> ha scritto:

> Hi Federico
>
> I tested the trunk today and it works fine for me - I let it spin for 1000
> cycles without issue. My test program is essentially identical to what you
> describe - you can see it in the orte/test/mpi directory. The "master" is
> loop_spawn.c, and the "slave" is loop_child.c. I only tested it on a single
> machine, though - will have to test multi-machine later. You might see if
> that makes a difference.
>
> The error you report in your attachment is a classic symptom of mismatched
> versions. Remember, we don't forward your ld_lib_path, so it has to be
> correct on your remote machine.
>
> As for r22794 - we don't keep anything that old on our web site. If you
> want to build it, the best way to get the code is to do a subversion
> checkout of the developer's trunk at that revision level:
>
> svn co -r 22794 http://svn.open-mpi.org/svn/ompi/trunk
>
> Remember to run autogen before configure.
>
>
> On Mar 4, 2011, at 4:43 AM, Federico Golfrè Andreasi wrote:
>
>
> Hi Ralph,
>
> I'm getting stuck with spawning stuff,
>
> I've downloaded the snapshot from the trunk of 1st of March (
> openmpi-1.7a1r24472.tar.bz2),
> I'm testing using a small program that does the following:
>  - master program starts and each rank prints his hostsname
>  - master program spawn a slave program with the same size
>  - each rank of the slave (spawned) program prints his hostname
>  - end
> Not always he is able to complete the progam run, two different behaviour:
>  1. not all the slave print their hostname and the program ends suddenly
>  2. both program ends correctly but orted demon is still alive and I need
> to press crtl-c to exit
>
>
> I've tryed to recompile my test program with a previous snapshot
> (openmpi-1.7a1r22794.tar.bz2)
> where I have only the compiled version of OpenMPI (in another machine).
> It gives me an error before starting (I've attacehd)
> Surfing on the FAQ I found some tip and I verified to compile the program
> with the correct OpenMPI version,
> that the LD_LIBRARY_PATH is consistent.
> So I would like to re-compile the openmpi-1.7a1r22794.tar.bz2 but where
> can I found it ?
>
>
> Thank you,
> Federico
>
>
>
>
>
>
>
>
>
>
> Il giorno 23 febbraio 2011 03:43, Ralph Castain <rhc.open...@gmail.com> ha
> scritto:
>
>> Apparently not. I will investigate when I return from vacation next week.
>>
>>
>> Sent from my iPad
>>
>> On Feb 22, 2011, at 12:42 AM, Federico Golfrè Andreasi <
>> federico.gol...@gmail.com> wrote:
>>
>> Hi Ralf,
>>
>> I've tested spawning with the OpenMPI 1.5 release but that fix is not
>> there.
>> Are you sure you've added it ?
>>
>> Thank you,
>> Federico
>>
>>
>>
>> 2010/10/19 Ralph Castain < <r...@open-mpi.org>r...@open-mpi.org>
>>
>>> The fix should be there - just didn't get mentioned.
>>>
>>> Let me know if it isn't and I'll ensure it is in the next one...but I'd
>>> be very surprised if it isn't already in there.
>>>
>>>
>>> On Oct 19, 2010, at 3:03 AM, Federico Golfrè Andreasi wrote:
>>>
>>> Hi Ralf !
>>>
>>> I saw that the new realease 1.5 is out.
>>> I didn't found this fix in the "list of changes", is it present but not
>>> mentioned since is a minor fix ?
>>>
>>> Thank you,
>>> Federico
>>>
>>>
>>>
>>> 2010/4/1 Ralph Castain < <r...@open-mpi.org>r...@open-mpi.org>
>>>
>>>> Hi there!
>>>>
>>>> It will be in the 1.5.0 release, but not 1.4.2 (couldn't backport the
>>>> fix). I understand that will come out sometime soon, but no firm date has
>>>> been set.
>>>>
>>>>
>>>> On Apr 1, 2010, at 4:05 AM, Federico Golfrè Andreasi wrote:
>>>>
>>>> Hi Ralph,
>>>>
>>>>
>>>>          I've downloaded and tested the openmpi-1.7a1r22817 snapshot,
>>>> and it works fine for (multiple) spawning more than 128 processes.
>>>>
>>>> That fix will be included in the next release of OpenMPI, right ?
>>>> Do you when it will be released ? Or where I can find that info ?
>>>>
>>>> Thank you,
>>>>      Federico
>>>>
>>>>
>>>>
>>>> 2010/3/1 Ralph Castain < <r...@open-mpi.org>r...@open-mpi.org>
>>>>
>>>>> <http://www.open-mpi.org/nightly/trunk/>
>>>>> http://www.open-mpi.org/nightly/trunk/
>>>>>
>>>>> I'm not sure this patch will solve your problem, but it is worth a try.
>>>>>
>>>>>
>>>>>
>>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> <us...@open-mpi.org>us...@open-mpi.org
>>>> <http://www.open-mpi.org/mailman/listinfo.cgi/users>
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>>  <us...@open-mpi.org>us...@open-mpi.org
>>>>  <http://www.open-mpi.org/mailman/listinfo.cgi/users>
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> <us...@open-mpi.org>us...@open-mpi.org
>>> <http://www.open-mpi.org/mailman/listinfo.cgi/users>
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>>  <us...@open-mpi.org>us...@open-mpi.org
>>>  <http://www.open-mpi.org/mailman/listinfo.cgi/users>
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>
>>
> <OpenMPI.error>
>
>
>
/*
 *
 * PROGRAM TEST for MPI_COMM_SPAWN_MULTIPLE
 *
 * prototype program that simulate the spawn process needed for the SJI MT Domain Manager
 * the manager must be executed with the exec command of the worker as first input parameter
 *
 * updated to the OpenMPI-1.4.0
 *
 *
 * program MASTER
 *
 * Author:  Federico Golfre' Andreasi
 * Created: 28/01/2010
 *
 */


#include "mpi.h"
#include <iostream>

using namespace std;


int main ( int argc, char* argv[] ) {


	int			rank,size;
	char		local_host[MPI_MAX_PROCESSOR_NAME];
	int			local_host_len;
	MPI_Comm	intercomm;



	// *** MPI SESSION ***

	//  Initialization of MPI session
	MPI_Init(&argc,&argv);



	// *** GET INFORMATION ABOUT THE WORLD COMMUNICATOR ***

	// Get the size and the rank within the Comm
	MPI_Comm_rank(MPI_COMM_WORLD,&rank);
	MPI_Comm_size(MPI_COMM_WORLD,&size);
	if (rank==0) cout<<"\n***** MASTER (SPAWNING) ****\n";
	MPI_Barrier(MPI_COMM_WORLD);
	// Get the name of the host
	MPI_Get_processor_name(local_host,&local_host_len);
	cout<<" Rank "<<rank<<" runs on host: "<<local_host<<"\n";
	MPI_Barrier(MPI_COMM_WORLD);


	// *** DEFINITION OF VARIABLES ***

	char 	*commands[size];
	int 	procs[size];
	MPI_Info infos[size];
	char	hosts[size][MPI_MAX_PROCESSOR_NAME];
	// Gather to All
//	MPI_Allgather(local_host,MPI_MAX_PROCESSOR_NAME,MPI_CHAR,hosts,MPI_MAX_PROCESSOR_NAME,MPI_CHAR,MPI_COMM_WORLD);
	MPI_Gather (local_host,MPI_MAX_PROCESSOR_NAME,MPI_CHAR,hosts,MPI_MAX_PROCESSOR_NAME,MPI_CHAR,0,MPI_COMM_WORLD);
	if (rank==0) {
		for (int i=0;i<size;i++) {
			commands[i]=argv[1]; procs[i]=1;
			MPI_Info_create(&infos[i]);
			MPI_Info_set(infos[i],"host",hosts[i]);
			cout<<" child "<<i<<" will go on host "<<hosts[i]<<endl;
		}
	}


	// *** EXECUTING THE SLAVE PROGRAM ***

	// Barrier
	MPI_Barrier(MPI_COMM_WORLD);
	if ( rank==0 ) cout<<"\t spawning the slave program "<<argv[1]<<" ...\n";
	// Launching the slave and check for some errors
	int		spawn_errors[size];
	MPI_Comm_spawn_multiple(size,commands,MPI_ARGVS_NULL,procs,infos,0,MPI_COMM_WORLD,&intercomm,spawn_errors);
	if (rank==0) {
		for ( int i=0;i<size;i++ ) {
			if ( spawn_errors[i]!=MPI_SUCCESS ) cout<<"ERROR with spawning process number "<<i<<endl;
		}
	}
	// Destroy all the Infos object
	if (rank==0) {
	 for (int i=0;i<size;i++) MPI_Info_free(&infos[i]);
	}

	// Inform that the spawning process is completed
	if (rank==0) cout<<"\t spawning process complete;\n";


	// *** END OF THE PROGRAM AND PETSC SESSION ***

	if (rank==0) cout<<"**** THE MASTER END ****\n\n";
	MPI_Finalize();
	return EXIT_SUCCESS;



}
/*
 *
 * PROGRAM TEST for MPI_COMM_SPAWN_MULTIPLE
 *
 * prototype program that simulate the spawn process needed for the SJI MT Domain Manager
 *
 * updated to the OpenMPI-1.4.0
 *
 *
 * program SLAVE
 *
 * Author:  Federico Golfre' Andreasi
 * Created: 28/01/2010
 *
 */


#include "mpi.h"
#include <iostream>


using namespace std;
#define   MAX_PROCESSOR_NAME 255


int main (int argc, char *argv[]) {


	int			worker_rank,worker_size;
	char		local_host[MAX_PROCESSOR_NAME];
	int			local_host_len;


	// *** MPI SESSION ***

	//  Initialization of MPI session
	MPI_Init(&argc,&argv);


	// *** GET INFORMATION ABOUT THE WORKER WORLD COMMUNICATOR ***

	// Get the size and the rank within the worker comm
	MPI_Comm_rank(MPI_COMM_WORLD,&worker_rank);
	MPI_Comm_size(MPI_COMM_WORLD,&worker_size);
	if (worker_rank==0) cout<<"\n***** SLAVE (SPAWNED) ****\n";
	MPI_Barrier(MPI_COMM_WORLD);
	// Get the name of the host
	MPI_Get_processor_name(local_host,&local_host_len);
	cout<<" Rank "<<worker_rank<<" runs on host: "<<local_host<<" (argc="<<argc<<")\n";
	MPI_Barrier(MPI_COMM_WORLD);


	// *** END OF PETSC SESSION ***

	if (worker_rank==0) cout<<"**** THE SLAVE END ****\n\n";
	MPI_Finalize();
	return EXIT_SUCCESS;

}

Reply via email to