I am trying to add a host at run time and spawn a slave process.
The slave process starts but hangs or crashes in MPI_Init().
Code for the slave process is

#include <admodel.h>
int main(int argc,char * argv[])
{
  ofstream ofs("junk11");
  ofs << "calling MPI_Init" << endl;
  int err=MPI_Init(&argc,&argv);
  ofs << "returned MPI_Init err = " << err << endl;
}

I can run the slave process via ssh as

    ssh smudge ./mpitest

and the file junk11 then contains

calling MPI_Init
returned MPI_Init err = 0

However if I try to remotely spawn it then junk11 contains
only the line before the call to MPI_Init

calling MPI_Init

and the spawned process appears to have crashed.
The master process hangs at the spawn command.
The code to spawn the remote process is

     MPI_Info infotest;
     int ierr2=MPI_Info_create(&infotest);
     MPI_Info_set( infotest, "add-hostfile", "/home/dave/hostfile" );
     MPI_Info_set( infotest, "host", "smudge" );
     int localerr=MPI_Comm_spawn("mpitest", NULL, 1,
            infotest, 0, MPI_COMM_SELF, &everyone,
&(ierr(1)) );
If I change the line above to

     MPI_INFO_NULL, 0, MPI_COMM_SELF, &everyone,

Then mpitest is successfully spawned on the local machine.
Note that I am not using mpirun.

ompi_info output is identical for both machines

ompi_info -v ompi full --parsable
package:Open MPI dave@scum Distribution
ompi:version:full:1.5.4
ompi:version:svn:r25060
ompi:version:release_date:Aug 18, 2011
orte:version:full:1.5.4
orte:version:svn:r25060
orte:version:release_date:Aug 18, 2011
opal:version:full:1.5.4
opal:version:svn:r25060
opal:version:release_date:Aug 18, 2011
ident:1.5.4



How can I find out what is happening to the remote spawned  process?











Reply via email to