Hi If your LD_LIBRARY_PATH is not set for a non-interactive startup, then successful runs on the remote machines may not be sufficient evidence.
Check this FAQ http://www.open-mpi.org/faq/?category=running#adding-ompi-to-path To see if your variables are set correctly for non-interactive sessions on your nodes, you can execute mpirun --hostfile hostfile -np 4 printenv and scan the output for PATH and LD_LIBRARY_PATH. Hope this helps Jody On Sat, Jul 9, 2011 at 12:25 AM, Mohan, Ashwin <ashmo...@seas.wustl.edu> wrote: > Thanks Ralph. > > > > I have emailed the network admin on the firewall issue. > > > > About the PATH and LIBRARY PATH issue, is it sufficient evidence that the > path are set alright if I am able to compile and run successfully on > individual nodes mentioned in the machine file. > > > > Thanks, > Ashwin. > > > > From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On > Behalf Of Ralph Castain > Sent: Friday, July 08, 2011 1:58 PM > To: Open MPI Users > Subject: Re: [OMPI users] Error using hostfile > > > > Is there a firewall in the way? The error indicates that daemons were > launched on the remote machines, but failed to communicate back. > > > > Also, check that your remote PATH and LD_LIBRARY_PATH are being set to the > right place to pickup this version of OMPI. Lots of systems deploy with > default versions that may not be compatible, so if you wind up running a > daemon on the remote node that comes from another installation, things won't > work. > > > > > > On Jul 8, 2011, at 10:52 AM, Mohan, Ashwin wrote: > > Hi, > > I am following up on a previous error posted. Based on the previous > recommendation, I did set up a password less SSH login. > > > > I created a hostfile comprising of 4 nodes (w/ each node having 4 slots). I > tried to run my job on 4 slots but get no output. Hence, I end up killing > the job. I am trying to run a simple MPI program on 4 nodes and trying to > figure out what could be the issue. What could I check to ensure that I can > run jobs on 4 nodes (each node has 4 slots) > > > > Here is the simple MPI program I am trying to execute on 4 nodes > > ************************** > > if (my_rank != 0) > > { > > sprintf(message, "Greetings from the process %d!", my_rank); > > dest = 0; > > MPI_Send(message, strlen(message)+1, MPI_CHAR, dest, tag, > MPI_COMM_WORLD); > > } > > else > > { > > for (source = 1;source < p; source++) > > { > > MPI_Recv(message, 100, MPI_CHAR, source, tag, MPI_COMM_WORLD, > &status); > > printf("%s\n", message); > > } > > > > **************************** > > My hostfile looks like this: > > > > [amohan@myocyte48 ~]$ cat hostfile > > myocyte46 > > myocyte47 > > myocyte48 > > myocyte49 > > ******************************* > > > > I use the following run command: : mpirun --hostfile hostfile -np 4 new46 > > And receive a blank screen. Hence, I have to kill the job. > > > > OUTPUT ON KILLING JOB: > > mpirun: killing job... > > -------------------------------------------------------------------------- > > mpirun noticed that the job aborted, but has no info as to the process > > that caused that situation. > > -------------------------------------------------------------------------- > > -------------------------------------------------------------------------- > > mpirun was unable to cleanly terminate the daemons on the nodes shown > > below. Additional manual cleanup may be required - please refer to > > the "orte-clean" tool for assistance. > > -------------------------------------------------------------------------- > > myocyte46 - daemon did not report back when launched > > myocyte47 - daemon did not report back when launched > > myocyte49 - daemon did not report back when launched > > > > Thanks, > > Ashwin. > > > > > > From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On > Behalf Of Ralph Castain > Sent: Wednesday, July 06, 2011 6:46 PM > To: Open MPI Users > Subject: Re: [OMPI users] Error using hostfile > > > > Please see http://www.open-mpi.org/faq/?category=rsh#ssh-keys > > > > > > On Jul 6, 2011, at 5:09 PM, Mohan, Ashwin wrote: > > > Hi, > > > > I use the following command (mpirun --prefix /usr/local/openmpi1.4.3 -np 4 > hello) to successfully execute a simple hello world command on a single > node. Each node has 4 slots. Following the successful execution on one > node, I wish to employ 4 nodes and for this purpose wrote a hostfile. I > submitted my job using the following command: > > > > mpirun --prefix /usr/local/openmpi1.4.3 -np 4 --hostfile hostfile hello > > > > Copied below is the output. How do I go about fixing this issue. > > > > ********************************************************************** > > > > amohan@myocyte48's password: amohan@myocyte47's password: > > Permission denied, please try again. > > amohan@myocyte48's password: > > Permission denied, please try again. > > amohan@myocyte47's password: > > Permission denied, please try again. > > amohan@myocyte47's password: > > Permission denied, please try again. > > amohan@myocyte48's password: > > > > Permission denied (publickey,gssapi-with-mic,password). > > -------------------------------------------------------------------------- > > A daemon (pid 22085) died unexpectedly with status 255 while attempting > > to launch so we are aborting. > > > > There may be more information reported by the environment (see above). > > > > This may be because the daemon was unable to find all the needed shared > > libraries on the remote node. You may set your LD_LIBRARY_PATH to have the > > location of the shared libraries on the remote nodes and this will > > automatically be forwarded to the remote nodes. > > -------------------------------------------------------------------------- > > -------------------------------------------------------------------------- > > mpirun noticed that the job aborted, but has no info as to the process > > that caused that situation. > > -------------------------------------------------------------------------- > > -------------------------------------------------------------------------- > > mpirun was unable to cleanly terminate the daemons on the nodes shown > > below. Additional manual cleanup may be required - please refer to > > the "orte-clean" tool for assistance. > > -------------------------------------------------------------------------- > > myocyte47 - daemon did not report back when launched > > myocyte48 - daemon did not report back when launched > > > > ********************************************************************** > > > > Thanks, > > Ashwin. > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >