On Monday 09 July 2007 12:52:29 pm jody wrote: > Tim, > thanks for your suggestions. > There seems to be something wrong with the PATH: > jody@aim-nano_02 ~/progs $ ssh 130.60.49.128 printenv | grep PATH > PATH=/usr/bin:/bin:/usr/sbin:/sbin > > which i don't understand. Logging via ssh into 130.60.49.128 i get: > > jody@aim-nano_02 ~/progs $ ssh 130.60.49.128 > Last login: Mon Jul 9 18:26:11 2007 from 130.60.49.129 > jody@aim-nano_00 ~ $ cat .bash_profile > # /etc/skel/.bash_profile > > # This file is sourced by bash for login shells. The following line > # runs your .bashrc and is recommended by the bash info pages. > [[ -f ~/.bashrc ]] && . ~/.bashrc > > PATH=/opt/openmpi/bin:$PATH > export PATH > LD_LIBRARY_PATH=/opt/openmpi/lib:$LD_LIBRARY_PATH > export LD_LIBRARY_PATH > > > jody@aim-nano_00 ~ $ echo $PATH > /opt/openmpi/bin:/opt/openmpi/bin:/usr/local/bin:/usr/bin:/bin:/opt/bin:/us >r/i686-pc-linux-gnu/gcc-bin/3.4.5:/opt/sun- > jdk-1.4.2.10/bin:/opt/sun-jdk-1.4.2.10/jre/bin:/opt/sun-jdk-1.4.2.10 > /jre/javaws:/usr/qt/3/bin > > (aim-nano_00 is the name of 130.60.49.128) > So why is the path set when i ssh by hand, > but not otherwise? You must set the path in .bashrc. See http://www.open-mpi.org/faq/?category=running#adding-ompi-to-path
Make sure: ssh 130.60.49.128 which orted works. If it doesn't, there is something wrong with the PATH. > > The suggestion with the --prefix option also didn't work: > jody@aim-nano_02 /home/aim-cari/jody $ mpirun -np 2 --prefix /opt/openmpi > --hostfile hostfile ./a.out > [aim-nano_02:13733] [0,0,0] ORTE_ERROR_LOG: Data unpack failed in file > dss/dss_peek.c at line 59 > [aim-nano_02:13733] [0,0,0] ORTE_ERROR_LOG: Data unpack failed in file > dss/dss_peek.c at line 59 > [aim-nano_02:13733] [0,0,0] ORTE_ERROR_LOG: Data unpack failed in file > dss/dss_peek.c at line 59 > [aim-nano_02:13733] [0,0,0] ORTE_ERROR_LOG: Data unpack failed in file > dss/dss_peek.c at line 59 > [aim-nano_02:13733] [0,0,0] ORTE_ERROR_LOG: Data unpack failed in file > dss/dss_peek.c at line 59 > [aim-nano_02:13733] [0,0,0] ORTE_ERROR_LOG: Data unpack failed in file > dss/dss_peek.c at line 59 Often this means that there is a version mismatch. Do all the nodes have the same version of Open MPI installed? Did you compile your application with this version of Open MPI? Tim > > (after which the thing seems to hang....) > > If i use the aim-nano_02 (130.60.49.130) instead of a hostfile, > jody@aim-nano_02 /home/aim-cari/jody $ mpirun -np 2 --prefix /opt/openmpi > --host 130.60.49.130 ./a.out > it works, as it does if i run it on the machine itself the standard way > jody@aim-nano_02 /home/aim-cari/jody $ mpirun -np 2 --host > 130.60.49.130./a.out > > Is there anything else i could try? > > Jody > > On 7/9/07, Tim Prins <tpr...@open-mpi.org> wrote: > > jody wrote: > > > Hi Tim > > > (I accidentally sent the previous message before it was ready - here's > > > the complete one) > > > Thank You for your reply. > > > Unfortunately my workstation, on which i could successfully run openmpi > > > applications, has died. But one my replacement machine (which > > > i assume i have setup in an equivalent way) i now get errors even when > > > i > > try > > > > to run an openmpi application in a simple way: > > > > > > jody@aim-nano_02 /home/aim-cari/jody $ mpirun -np 2 --hostfile > > > hostfile > > ./a.out > > > > bash: orted: command not found > > > [aim-nano_02:22145] ERROR: A daemon on node 130.60.49.129 failed to > > > start as expected. > > > [aim-nano_02:22145] ERROR: There may be more information available from > > > [aim-nano_02:22145] ERROR: the remote shell (see above). > > > [aim-nano_02:22145] ERROR: The daemon exited unexpectedly with status > > 127. > > > > [aim-nano_02:22145] ERROR: A daemon on node 130.60.49.128 failed to > > > start as expected. > > > [aim-nano_02:22145] ERROR: There may be more information available from > > > [aim-nano_02:22145] ERROR: the remote shell (see above). > > > [aim-nano_02:22145] ERROR: The daemon exited unexpectedly with status > > 127. > > > > However, i set PATH and LD_LIBRARY_PATH to the correct paths both in > > > .bashrc AND .bash_profile. > > > > I assume you are using bash. You might try changing your .profile as > > well. > > > > > For example: > > > jody@aim-nano_02 /home/aim-cari/jody $ ssh 130.60.49.128 echo $PATH > > /opt/openmpi/bin:/usr/local/bin:/usr/bin:/bin:/opt/bin:/usr/i686-pc-linux-g >nu/gcc-bin/4.1.2:/opt/sun- > jdk-1.4.2.10/bin:/opt/sun-jdk-1.4.2.10/jre/bin:/opt/sun-jdk-1.4.2.10 > /jre/javaws:/usr/qt/3/bin > > > When you do this, $PATH gets interpreted on the local host, not the > > remote host. Try instead: > > > > ssh 130.60.49.128 printenv |grep PATH > > > > > But: > > > jody@aim-nano_02 /home/aim-cari/jody $ ssh 130.60.49.128 orted > > > bash: orted: command not found > > > > You could also do: > > ssh 130.60.49.128 which orted > > > > This will show you the paths it looked in for the orted. > > > > > Do You have any suggestions? > > > > To avoid dealing with paths (assuming everything is installed in the > > same directory on all nodes) you can also try the suggestion here > > (although I think that once it is setup modifying PATHs is the easier > > way to go, less typing :): > > http://www.open-mpi.org/faq/?category=running#mpirun-prefix > > > > > > Hope this helps, > > > > Tim > > > > > Thank You > > > Jody > > > > > > On 7/9/07, Tim Prins <tpr...@open-mpi.org> wrote: > > >> Hi Jody, > > >> > > >> Sorry for the super long delay. I don't know how this one got lost... > > >> > > >> I run like this all the time. Unfortunately, it is not as simple as I > > >> would like. Here is what I do: > > >> > > >> 1. Log into the machine using ssh -X > > >> 2. Run mpirun with the following parameters: > > >> -mca pls rsh (This makes sure that Open MPI uses the rsh/ssh > > launcher. > > > >> It may not be necessary depending on your setup) > > >> -mca pls_rsh_agent "ssh -X" (To make sure X information is > > forwarded. > > > >> This might not be necessary if you have ssh setup to always forward X > > >> information) > > >> --debug-daemons (This ensures that the ssh connections to the > > backed > > > >> nodes are kept open. Otherwise, they are closed and X information > > cannot > > > >> be forwarded. Unfortunately, this will also cause some debugging > > >> output to be printed, but right now there is no other way :( ) > > >> > > >> So, the complete command is: > > >> mpirun -np 4 -mca pls rsh -mca pls_rsh_agent "ssh -X" --debug-daemons > > >> xterm -e gdb my_prog > > >> > > >> I hope this helps. Let me know if you are still experiencing problems. > > >> > > >> Tim > > >> > > >> jody wrote: > > >>> Hi > > >>> For debugging i usually run each process in a separate X-window. > > >>> This works well if i set the DISPLAY variable to the computer > > >>> from which i am starting my OpenMPI application. > > >>> > > >>> This method fails however, if i log in (via ssh) to my workstation > > >>> from a third computer and then start my OpenMPI application, > > >>> only the processes running on the workstation i logged into can > > >>> open their windows on the third computers. The processes on > > >>> the other computers cant open their windows. > > >>> > > >>> This is how i start the processes > > >>> > > >>> mpirun -np 4 -x DISPLAY run_gdb.sh ./TestApp > > >>> > > >>> where run_gdb.sh looks like this > > >>> ------------------------- > > >>> #!/bin/csh -f > > >>> > > >>> echo "Running GDB on node `hostname`" > > >>> xterm -e gdb $* > > >>> exit 0 > > >>> ------------------------- > > >>> The output from the processes on the other computer: > > >>> xterm Xt error: Can't open display: localhost:12.0 > > >>> > > >>> I there a way to tell OpenMPI to forward the X windows > > >>> over yet another ssh connection? > > >>> > > >>> Thanks > > >>> Jody > > >>> _______________________________________________ > > >>> users mailing list > > >>> us...@open-mpi.org > > >>> http://www.open-mpi.org/mailman/listinfo.cgi/users > > >> > > >> _______________________________________________ > > >> users mailing list > > >> us...@open-mpi.org > > >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > > > _______________________________________________ > > > users mailing list > > > us...@open-mpi.org > > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users