I am trying to add a host at run time and spawn a slave process. The slave process starts but hangs or crashes in MPI_Init(). Code for the slave process is
#include <admodel.h> int main(int argc,char * argv[]) { ofstream ofs("junk11"); ofs << "calling MPI_Init" << endl; int err=MPI_Init(&argc,&argv); ofs << "returned MPI_Init err = " << err << endl; } I can run the slave process via ssh as ssh smudge ./mpitest and the file junk11 then contains calling MPI_Init returned MPI_Init err = 0 However if I try to remotely spawn it then junk11 contains only the line before the call to MPI_Init calling MPI_Init and the spawned process appears to have crashed. The master process hangs at the spawn command. The code to spawn the remote process is MPI_Info infotest; int ierr2=MPI_Info_create(&infotest); MPI_Info_set( infotest, "add-hostfile", "/home/dave/hostfile" ); MPI_Info_set( infotest, "host", "smudge" ); int localerr=MPI_Comm_spawn("mpitest", NULL, 1, infotest, 0, MPI_COMM_SELF, &everyone, &(ierr(1)) ); If I change the line above to MPI_INFO_NULL, 0, MPI_COMM_SELF, &everyone, Then mpitest is successfully spawned on the local machine. Note that I am not using mpirun. ompi_info output is identical for both machines ompi_info -v ompi full --parsable package:Open MPI dave@scum Distribution ompi:version:full:1.5.4 ompi:version:svn:r25060 ompi:version:release_date:Aug 18, 2011 orte:version:full:1.5.4 orte:version:svn:r25060 orte:version:release_date:Aug 18, 2011 opal:version:full:1.5.4 opal:version:svn:r25060 opal:version:release_date:Aug 18, 2011 ident:1.5.4 How can I find out what is happening to the remote spawned process?