Re: [OMPI users] Error bash: /usr/mpi/gcc/openmpi-1.8.8/bin/orted: No such file or directory

2016-11-20 Thread Gilles Gouaillardet
Sebastian,

The error message is pretty self-explanatory
/usr/mpi/gcc/openmpi-1.8.8/bin/orted is missing on your compute nodes.

it seems you are using /usr/mpi/gcc/openmpi-1.8.8/bin/mpirun on your
frontend node
(e.g. the node on which mpirun is invoked)
but Open MPI was not updated on some nodes listed in your nodes8 machinefile

you likely want to contact your sysadmin and figure this out

Cheers,

Gilles

On Sat, Nov 19, 2016 at 4:22 PM, Sebastian Antunez N.
 wrote:
> Hello Guys
>
> I have a cluster of HPC and I update OFED, Firmware etc.
>
> Post reboot and run  mpirun -machinefile nodes8 -n 128
> /home/HPL/run_hpl/xhpl show the following error
>
> bash: /usr/mpi/gcc/openmpi-1.8.8/bin/orted: No such file or directory
> bash: /usr/mpi/gcc/openmpi-1.8.8/bin/orted: No such file or directory
> bash: /usr/mpi/gcc/openmpi-1.8.8/bin/orted: No such file or directory
> --
> ORTE was unable to reliably start one or more daemons.
> This usually is caused by:
>
> * not finding the required libraries and/or binaries on
>   one or more nodes. Please check your PATH and LD_LIBRARY_PATH
>   settings, or configure OMPI with --enable-orterun-prefix-by-default
>
> * lack of authority to execute on one or more specified nodes.
>   Please verify your allocation and authorities.
>
> * the inability to write startup files into /tmp
> (--tmpdir/orte_tmpdir_base).
>   Please check with your sys admin to determine the correct location to use.
>
> *  compilation of the orted with dynamic libraries when static are required
>   (e.g., on Cray). Please check your configure cmd line and consider using
>   one of the contrib/platform definitions for your system type.
>
> * an inability to create a connection back to mpirun due to a
>   lack of common network interfaces and/or no route found between
>   them. Please check network connectivity (including firewalls
>   and network routing requirements).
>
>
>
> Before update I have version 1.6.4 and the cluster not show errors when I
> run the mpirun
>
> I changed the Enviroment Variables but persist the error.
>
> Is possible ypur comment who resolved the issue.
>
> Regards
>
> Sebastian Antunez
>
>
>
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


Re: [OMPI users] Error bash: /usr/mpi/gcc/openmpi-1.8.8/bin/orted: No such file or directory

2016-11-20 Thread Sebastian Antunez N.
Hello

Thank for your comment

Only the frontend was updated directly via install.sh fron ofed 2.4.3 to
ofed 3.1.1.0 and contains openmpi 1.8.8.

Now the compute node have a older version of ofed 2.4 with openmpi 1.6.4

My question; if is possible update ofed directly in the compute node
executing  install.sh in ofed or is recomended add the rolls and update the
nodes.

Regards.

Sebastian

El 20 nov. 2016 03:15, "Gilles Gouaillardet" 
escribió:

> Sebastian,
>
> The error message is pretty self-explanatory
> /usr/mpi/gcc/openmpi-1.8.8/bin/orted is missing on your compute nodes.
>
> it seems you are using /usr/mpi/gcc/openmpi-1.8.8/bin/mpirun on your
> frontend node
> (e.g. the node on which mpirun is invoked)
> but Open MPI was not updated on some nodes listed in your nodes8
> machinefile
>
> you likely want to contact your sysadmin and figure this out
>
> Cheers,
>
> Gilles
>
> On Sat, Nov 19, 2016 at 4:22 PM, Sebastian Antunez N.
>  wrote:
> > Hello Guys
> >
> > I have a cluster of HPC and I update OFED, Firmware etc.
> >
> > Post reboot and run  mpirun -machinefile nodes8 -n 128
> > /home/HPL/run_hpl/xhpl show the following error
> >
> > bash: /usr/mpi/gcc/openmpi-1.8.8/bin/orted: No such file or directory
> > bash: /usr/mpi/gcc/openmpi-1.8.8/bin/orted: No such file or directory
> > bash: /usr/mpi/gcc/openmpi-1.8.8/bin/orted: No such file or directory
> > 
> --
> > ORTE was unable to reliably start one or more daemons.
> > This usually is caused by:
> >
> > * not finding the required libraries and/or binaries on
> >   one or more nodes. Please check your PATH and LD_LIBRARY_PATH
> >   settings, or configure OMPI with --enable-orterun-prefix-by-default
> >
> > * lack of authority to execute on one or more specified nodes.
> >   Please verify your allocation and authorities.
> >
> > * the inability to write startup files into /tmp
> > (--tmpdir/orte_tmpdir_base).
> >   Please check with your sys admin to determine the correct location to
> use.
> >
> > *  compilation of the orted with dynamic libraries when static are
> required
> >   (e.g., on Cray). Please check your configure cmd line and consider
> using
> >   one of the contrib/platform definitions for your system type.
> >
> > * an inability to create a connection back to mpirun due to a
> >   lack of common network interfaces and/or no route found between
> >   them. Please check network connectivity (including firewalls
> >   and network routing requirements).
> >
> >
> >
> > Before update I have version 1.6.4 and the cluster not show errors when I
> > run the mpirun
> >
> > I changed the Enviroment Variables but persist the error.
> >
> > Is possible ypur comment who resolved the issue.
> >
> > Regards
> >
> > Sebastian Antunez
> >
> >
> >
> > ___
> > users mailing list
> > users@lists.open-mpi.org
> > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users