[OMPI users] --preload-binary does not work
Hello I am using OpenMPI ver 1.8.1 on a cluster of 4 machines. One Redhat 6.2 and three busybox machine. They are all 64bit environment. I want to use --preload-binary option to send the binary file to hosts but it's not working. # /mpi/bin/mpirun --prefix /mpi --preload-files ./a.out --allow-run-as-root --np 4 --host box0101,box0103 --preload-binary ./a.out -- mpirun was unable to launch the specified application as it could not access or execute an executable: Executable: ./a.out Node: box0101 while attempting to start process rank 17. -- 17 total processes failed to start # If I sent the binary by SCP beforehand, the command works fine. SCP is working fine without password between the hosts. Is the option supposed to work? Thank you, Eiichi
Re: [OMPI users] --preload-binary does not work
Thank you! With the patch, --preload-binary option is working fine. However, if I add "--gmca plm_rsh_no_tree_spawn 1" as a mpirun command line option, it hangs. # /mpi/bin/mpirun --allow-run-as-root --gmca plm_rsh_no_tree_spawn 1 --preload-binary --hostfile /root/.hosts --prefix /mpi --np 120 a.out If I ran the command without --preload-binary, it works fine (have to copy the binary to each node beforehand of course). I guess this is a different issue? Eiichi eiichi On Fri, Jun 6, 2014 at 5:35 PM, Ralph Castain wrote: > Okay, I found the problem and think I have a fix that I posted (copied EO > on it). You are welcome to download the patch and try it. Scheduled for > release in 1.8.2 > > Thanks > Ralph > > > On Jun 6, 2014, at 1:01 PM, Ralph Castain wrote: > > Yeah, it doesn't require ssh any more - but I haven't tested it in a bit, > and so it's possible something crept in there. > > On Jun 6, 2014, at 12:27 PM, Reuti wrote: > > Am 06.06.2014 um 21:04 schrieb Ralph Castain: > > Supposed to, yes - but I don't know how much testing it has seen. I can > try to take a look > > > Wasn't it on the list recently, that 1.8.1 should do it even without > passphraseless SSH between the nodes? > > -- Reuti > > > On Jun 6, 2014, at 12:02 PM, E.O. wrote: > > Hello > I am using OpenMPI ver 1.8.1 on a cluster of 4 machines. > One Redhat 6.2 and three busybox machine. They are all 64bit environment. > > I want to use --preload-binary option to send the binary file to hosts but > it's not working. > > # /mpi/bin/mpirun --prefix /mpi --preload-files ./a.out > --allow-run-as-root --np 4 --host box0101,box0103 --preload-binary ./a.out > -- > mpirun was unable to launch the specified application as it could not > access > or execute an executable: > > Executable: ./a.out > Node: box0101 > > while attempting to start process rank 17. > -- > 17 total processes failed to start > # > > If I sent the binary by SCP beforehand, the command works fine. SCP is > working fine without password between the hosts. > Is the option supposed to work? > Thank you, > > Eiichi > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
[OMPI users] getting opal_init:startup:internal-failure
Hello I have five linux machines (one is redhat and the other are busybox) I downloaded openmpi-1.6.4.tar.gz into my main redhat machine and configure'ed/compiled it successfully. ./configure --prefix=/myname I installed it to /myname directory successfully. I am able to run a simple hallo.c on my redhat machine. [root@host1 /tmp] # mpirun -np 4 ./hello.out I am parent I am a child I am a child I am a child [root@host1 /tmp] # Then, I sent entire /myname directory to the another machine (host2). [root@host1 /] # tar zcf - myname | ssh host2 "(cd /; tar zxf -)" and ran mpirun for the host (host2). [root@host1 tmp]# mpirun -np 4 -host host2 ./hello.out -- Sorry! You were supposed to get help about: opal_init:startup:internal-failure But I couldn't open the help file: //share/openmpi/help-opal-runtime.txt: No such file or directory. Sorry! -- [host2:26294] [[INVALID],INVALID] ORTE_ERROR_LOG: Error in file runtime/orte_init.c at line 79 [host2:26294] [[INVALID],INVALID] ORTE_ERROR_LOG: Error in file orted/orted_main.c at line 358 -- A daemon (pid 23691) died unexpectedly with status 255 while attempting to launch so we are aborting. There may be more information reported by the environment (see above). This may be because the daemon was unable to find all the needed shared libraries on the remote node. You may set your LD_LIBRARY_PATH to have the location of the shared libraries on the remote nodes and this will automatically be forwarded to the remote nodes. -- -- mpirun noticed that the job aborted, but has no info as to the process that caused that situation. -- [root@host1 tmp]# I set those environment variables [root@host1 tmp]# echo $LD_LIBRARY_PATH /myname/lib/ [root@host1 tmp]# echo $OPAL_PREFIX /myname/ [root@host1 tmp]# [root@host2 /] # ls -la /myname/lib/libmpi.so.1 lrwxrwxrwx1 root root15 Apr 28 10:21 /myname/lib/libmpi.so.1 -> libmpi.so.1.0.7 [root@host2 /] # If I ran the ./hello.out binary inside host2, it works fine [root@host1 tmp]# ssh host2 [root@host2 /] # /tmp/hello.out I am parent [root@host2 /] # Can someone help me figure out why I cannot run hello.out in host2 from host1 ? Am I missing any env variables ? Thank you, Eiichi
Re: [OMPI users] getting opal_init:startup:internal-failure
Thank you Ralph! I ran it with "-prefix" option but I got this... [root@host1 tmp]# mpirun -prefix /myname -np 4 -host host2 ./hello.out -- mpirun was unable to launch the specified application as it could not access or execute an executable: Executable: -prefix=/myname Node: host1 while attempting to start process rank 0. -- [root@host1 tmp]# I also updated PATH in the remote host (host2) to include /myname. But it didn't seem change anything... eiichi On Sun, Apr 28, 2013 at 11:48 AM, Ralph Castain wrote: > The problem is likely that your path variables aren't being set properly > on the remote machine when mpirun launches the remote daemon. You might > check to see that your default shell rc file is also setting those values > correctly. Alternatively, modify your mpirun cmd line a bit by adding > > mpirun -prefix /myname ... > > so it will set the remove prefix and see if that helps. If it does, you > can add --enable-orterun-prefix-by-default to your configure line so mpirun > always adds it. > > > On Apr 28, 2013, at 7:56 AM, "E.O." wrote: > > > Hello > > > > I have five linux machines (one is redhat and the other are busybox) > > I downloaded openmpi-1.6.4.tar.gz into my main redhat machine and > configure'ed/compiled it successfully. > > ./configure --prefix=/myname > > I installed it to /myname directory successfully. I am able to run a > simple hallo.c on my redhat machine. > > > > [root@host1 /tmp] # mpirun -np 4 ./hello.out > > I am parent > > I am a child > > I am a child > > I am a child > > [root@host1 /tmp] # > > > > Then, I sent entire /myname directory to the another machine (host2). > > [root@host1 /] # tar zcf - myname | ssh host2 "(cd /; tar zxf -)" > > > > and ran mpirun for the host (host2). > > > > [root@host1 tmp]# mpirun -np 4 -host host2 ./hello.out > > > -- > > Sorry! You were supposed to get help about: > > opal_init:startup:internal-failure > > But I couldn't open the help file: > > //share/openmpi/help-opal-runtime.txt: No such file or directory. > Sorry! > > > -- > > [host2:26294] [[INVALID],INVALID] ORTE_ERROR_LOG: Error in file > runtime/orte_init.c at line 79 > > [host2:26294] [[INVALID],INVALID] ORTE_ERROR_LOG: Error in file > orted/orted_main.c at line 358 > > > -- > > A daemon (pid 23691) died unexpectedly with status 255 while attempting > > to launch so we are aborting. > > > > There may be more information reported by the environment (see above). > > > > This may be because the daemon was unable to find all the needed shared > > libraries on the remote node. You may set your LD_LIBRARY_PATH to have > the > > location of the shared libraries on the remote nodes and this will > > automatically be forwarded to the remote nodes. > > > -- > > > -- > > mpirun noticed that the job aborted, but has no info as to the process > > that caused that situation. > > > -- > > [root@host1 tmp]# > > > > I set those environment variables > > > > [root@host1 tmp]# echo $LD_LIBRARY_PATH > > /myname/lib/ > > [root@host1 tmp]# echo $OPAL_PREFIX > > /myname/ > > [root@host1 tmp]# > > > > [root@host2 /] # ls -la /myname/lib/libmpi.so.1 > > lrwxrwxrwx1 root root15 Apr 28 10:21 > /myname/lib/libmpi.so.1 -> libmpi.so.1.0.7 > > [root@host2 /] # > > > > If I ran the ./hello.out binary inside host2, it works fine > > > > [root@host1 tmp]# ssh host2 > > [root@host2 /] # /tmp/hello.out > > I am parent > > [root@host2 /] # > > > > Can someone help me figure out why I cannot run hello.out in host2 from > host1 ? > > Am I missing any env variables ? > > > > Thank you, > > > > Eiichi > > > > > > ___ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
Re: [OMPI users] getting opal_init:startup:internal-failure
I tried configuring/building an OMPI on the remote host but I was not able to... The remote host (host2) doesn't have any development tools, such as gcc, make, etc... Since I am able to run an MPI hello_c binary on the remote host, I believe the host has all the necessary libraries needed for MPI. I am also able to run an MPI hello_c binary on host1 from host2. [root@host2 tmp]# mpirun -host localhost /tmp/hello.out Hello World from processor host2, rank 0 out of 1 processors [root@host2 tmp]# mpirun -host host2 /tmp/hello.out Hello World from processor host2, rank 0 out of 1 processors [root@host2 tmp]# mpirun -host host1 /tmp/hello.out Hello World from processor host1, rank 0 out of 1 processors [root@host2 tmp]# However I still can't run hello_c binary on host2 from host1 [root@host1 tmp]# mpirun -host host2 /tmp/hello.out -- Sorry! You were supposed to get help about: opal_init:startup:internal-failure But I couldn't open the help file: //share/openmpi/help-opal-runtime.txt: No such file or directory. Sorry! -- [host2:02499] [[INVALID],INVALID] ORTE_ERROR_LOG: Error in file runtime/orte_init.c at line 79 [host2:02499] [[INVALID],INVALID] ORTE_ERROR_LOG: Error in file orted/orted_main.c at line 358 -- A daemon (pid 17710) died unexpectedly with status 255 while attempting to launch so we are aborting. There may be more information reported by the environment (see above). This may be because the daemon was unable to find all the needed shared libraries on the remote node. You may set your LD_LIBRARY_PATH to have the location of the shared libraries on the remote nodes and this will automatically be forwarded to the remote nodes. -- -- mpirun noticed that the job aborted, but has no info as to the process that caused that situation. -- [root@host1 tmp]# If I set -prefix=/myname, it returns a different output [root@host1 tmp]# mpirun -prefix=/myname -host host2 /tmp/hello.out -- mpirun was unable to launch the specified application as it could not access or execute an executable: Executable: -prefix=/myname Node: host1 while attempting to start process rank 0. -- [root@host1 tmp]# Do you still want me to try building OMPI on the remote host? eiichi On Sun, Apr 28, 2013 at 12:24 PM, Ralph Castain wrote: > If you configure/build OMPI on the remote node using the same configure > options you used on host1, does the problem go away? > > > On Apr 28, 2013, at 8:58 AM, E.O. wrote: > > Thank you Ralph! > I ran it with "-prefix" option but I got this... > > [root@host1 tmp]# mpirun -prefix /myname -np 4 -host host2 ./hello.out > -- > mpirun was unable to launch the specified application as it could not > access > or execute an executable: > > Executable: -prefix=/myname > Node: host1 > > while attempting to start process rank 0. > -- > [root@host1 tmp]# > > I also updated PATH in the remote host (host2) to include /myname. > But it didn't seem change anything... > > eiichi > > > > > On Sun, Apr 28, 2013 at 11:48 AM, Ralph Castain wrote: > >> The problem is likely that your path variables aren't being set properly >> on the remote machine when mpirun launches the remote daemon. You might >> check to see that your default shell rc file is also setting those values >> correctly. Alternatively, modify your mpirun cmd line a bit by adding >> >> mpirun -prefix /myname ... >> >> so it will set the remove prefix and see if that helps. If it does, you >> can add --enable-orterun-prefix-by-default to your configure line so mpirun >> always adds it. >> >> >> On Apr 28, 2013, at 7:56 AM, "E.O." wrote: >> >> > Hello >> > >> > I have five linux machines (one is redhat and the other are busybox) >> > I downloaded openmpi-1.6.4.tar.gz into my main redhat machine and >> configure'ed/compiled it successfully. >> > ./configure --prefix=/myname >> > I installed it to /myname directory successfully. I am able to run a >> simple hallo.c on my redhat machine. >> > >> > [root@host1 /tmp] # mp
Re: [OMPI users] getting opal_init:startup:internal-failure
It works!!! By putting two dash'es and no equal sign, it worked fine!! [root@host1 tmp]# mpirun --prefix /myname --host host2 /tmp/hello.out Hello World from processor host2, rank 0 out of 1 processors [root@host1 tmp]# It looks like one dash "-prefix" also works if I don't put an equal sign.. Thank you very much!! Eiichi On Mon, Apr 29, 2013 at 8:29 AM, Ralph Castain wrote: > Hmmokay. No, let's not bother to install a bunch of stuff you don't > otherwise need. > > I probably mis-typed the "prefix" option - it has two dashes in front of > it and no equal sign: > > mpirun --prefix ./myname ... > > I suspect you only put one dash, and the equal sign was a definite > problem, which is why it gave you an error. > > > On Apr 29, 2013, at 5:12 AM, E.O. wrote: > > I tried configuring/building an OMPI on the remote host but I was not able > to... > The remote host (host2) doesn't have any development tools, such as gcc, > make, etc... > > Since I am able to run an MPI hello_c binary on the remote host, I believe > the host has all the necessary libraries needed for MPI. I am also able to > run an MPI hello_c binary on host1 from host2. > > [root@host2 tmp]# mpirun -host localhost /tmp/hello.out > Hello World from processor host2, rank 0 out of 1 processors > [root@host2 tmp]# mpirun -host host2 /tmp/hello.out > Hello World from processor host2, rank 0 out of 1 processors > [root@host2 tmp]# mpirun -host host1 /tmp/hello.out > Hello World from processor host1, rank 0 out of 1 processors > [root@host2 tmp]# > > However I still can't run hello_c binary on host2 from host1 > > [root@host1 tmp]# mpirun -host host2 /tmp/hello.out > -- > Sorry! You were supposed to get help about: > opal_init:startup:internal-failure > But I couldn't open the help file: > //share/openmpi/help-opal-runtime.txt: No such file or directory. > Sorry! > -- > [host2:02499] [[INVALID],INVALID] ORTE_ERROR_LOG: Error in file > runtime/orte_init.c at line 79 > [host2:02499] [[INVALID],INVALID] ORTE_ERROR_LOG: Error in file > orted/orted_main.c at line 358 > -- > A daemon (pid 17710) died unexpectedly with status 255 while attempting > to launch so we are aborting. > > There may be more information reported by the environment (see above). > > This may be because the daemon was unable to find all the needed shared > libraries on the remote node. You may set your LD_LIBRARY_PATH to have the > location of the shared libraries on the remote nodes and this will > automatically be forwarded to the remote nodes. > -- > -- > mpirun noticed that the job aborted, but has no info as to the process > that caused that situation. > -- > [root@host1 tmp]# > > > If I set -prefix=/myname, it returns a different output > > [root@host1 tmp]# mpirun -prefix=/myname -host host2 /tmp/hello.out > -- > mpirun was unable to launch the specified application as it could not > access > or execute an executable: > > Executable: -prefix=/myname > Node: host1 > > while attempting to start process rank 0. > -- > [root@host1 tmp]# > > Do you still want me to try building OMPI on the remote host? > > eiichi > > > > On Sun, Apr 28, 2013 at 12:24 PM, Ralph Castain wrote: > >> If you configure/build OMPI on the remote node using the same configure >> options you used on host1, does the problem go away? >> >> >> On Apr 28, 2013, at 8:58 AM, E.O. wrote: >> >> Thank you Ralph! >> I ran it with "-prefix" option but I got this... >> >> [root@host1 tmp]# mpirun -prefix /myname -np 4 -host host2 ./hello.out >> -- >> mpirun was unable to launch the specified application as it could not >> access >> or execute an executable: >> >> Executable: -prefix=/myname >> Node: host1 >> >> while attempting to start process rank 0. >> -- >> [root@host1 tmp]# >> >> I also updated PATH in the remote host (host2) to inc
Re: [OMPI users] getting opal_init:startup:internal-failure
Thank you! I agree that using NFS to share the home directory now.. I wanted to use --preload-binary option. eiichi On Mon, Apr 29, 2013 at 10:15 AM, Jeff Squyres (jsquyres) < jsquy...@cisco.com> wrote: > FWIW, to avoid using the --prefix option, you can set your PATH / > LD_LIBRARY_PATH to point to the Open MPI installation on all nodes. > > Many organizations opt to have NFS-shared home directories, so that when > you modify your "main" shell startup file (e.g., .bashrc) to point PATH and > LD_LIBRARY_PATH to your Open MPI installation, it effectively modifies it > for all nodes in the cluster. > > > > On Apr 29, 2013, at 8:56 AM, E.O. wrote: > > > It works!!! > > By putting two dash'es and no equal sign, it worked fine!! > > > > [root@host1 tmp]# mpirun --prefix /myname --host host2 /tmp/hello.out > > Hello World from processor host2, rank 0 out of 1 processors > > [root@host1 tmp]# > > > > It looks like one dash "-prefix" also works if I don't put an equal > sign.. > > > > Thank you very much!! > > > > Eiichi > > > > > > > > On Mon, Apr 29, 2013 at 8:29 AM, Ralph Castain wrote: > > Hmmokay. No, let's not bother to install a bunch of stuff you don't > otherwise need. > > > > I probably mis-typed the "prefix" option - it has two dashes in front of > it and no equal sign: > > > > mpirun --prefix ./myname ... > > > > I suspect you only put one dash, and the equal sign was a definite > problem, which is why it gave you an error. > > > > > > On Apr 29, 2013, at 5:12 AM, E.O. wrote: > > > >> I tried configuring/building an OMPI on the remote host but I was not > able to... > >> The remote host (host2) doesn't have any development tools, such as > gcc, make, etc... > >> > >> Since I am able to run an MPI hello_c binary on the remote host, I > believe the host has all the necessary libraries needed for MPI. I am also > able to run an MPI hello_c binary on host1 from host2. > >> > >> [root@host2 tmp]# mpirun -host localhost /tmp/hello.out > >> Hello World from processor host2, rank 0 out of 1 processors > >> [root@host2 tmp]# mpirun -host host2 /tmp/hello.out > >> Hello World from processor host2, rank 0 out of 1 processors > >> [root@host2 tmp]# mpirun -host host1 /tmp/hello.out > >> Hello World from processor host1, rank 0 out of 1 processors > >> [root@host2 tmp]# > >> > >> However I still can't run hello_c binary on host2 from host1 > >> > >> [root@host1 tmp]# mpirun -host host2 /tmp/hello.out > >> > -- > >> Sorry! You were supposed to get help about: > >> opal_init:startup:internal-failure > >> But I couldn't open the help file: > >> //share/openmpi/help-opal-runtime.txt: No such file or directory. > Sorry! > >> > -- > >> [host2:02499] [[INVALID],INVALID] ORTE_ERROR_LOG: Error in file > runtime/orte_init.c at line 79 > >> [host2:02499] [[INVALID],INVALID] ORTE_ERROR_LOG: Error in file > orted/orted_main.c at line 358 > >> > -- > >> A daemon (pid 17710) died unexpectedly with status 255 while attempting > >> to launch so we are aborting. > >> > >> There may be more information reported by the environment (see above). > >> > >> This may be because the daemon was unable to find all the needed shared > >> libraries on the remote node. You may set your LD_LIBRARY_PATH to have > the > >> location of the shared libraries on the remote nodes and this will > >> automatically be forwarded to the remote nodes. > >> > -- > >> > -- > >> mpirun noticed that the job aborted, but has no info as to the process > >> that caused that situation. > >> > -- > >> [root@host1 tmp]# > >> > >> > >> If I set -prefix=/myname, it returns a different output > >> > >> [root@host1 tmp]# mpirun -prefix=/myname -host host2 /tmp/hello.out > >> > ------ > >> mpirun was unable to launch the specifi