Hello
Thank for your comment
Only the frontend was updated directly via install.sh fron ofed 2.4.3 to
ofed 3.1.1.0 and contains openmpi 1.8.8.
Now the compute node have a older version of ofed 2.4 with openmpi 1.6.4
My question; if is possible update ofed directly in the compute node
executing install.sh in ofed or is recomended add the rolls and update the
nodes.
Regards.
Sebastian
El 20 nov. 2016 03:15, "Gilles Gouaillardet"
escribió:
> Sebastian,
>
> The error message is pretty self-explanatory
> /usr/mpi/gcc/openmpi-1.8.8/bin/orted is missing on your compute nodes.
>
> it seems you are using /usr/mpi/gcc/openmpi-1.8.8/bin/mpirun on your
> frontend node
> (e.g. the node on which mpirun is invoked)
> but Open MPI was not updated on some nodes listed in your nodes8
> machinefile
>
> you likely want to contact your sysadmin and figure this out
>
> Cheers,
>
> Gilles
>
> On Sat, Nov 19, 2016 at 4:22 PM, Sebastian Antunez N.
> wrote:
> > Hello Guys
> >
> > I have a cluster of HPC and I update OFED, Firmware etc.
> >
> > Post reboot and run mpirun -machinefile nodes8 -n 128
> > /home/HPL/run_hpl/xhpl show the following error
> >
> > bash: /usr/mpi/gcc/openmpi-1.8.8/bin/orted: No such file or directory
> > bash: /usr/mpi/gcc/openmpi-1.8.8/bin/orted: No such file or directory
> > bash: /usr/mpi/gcc/openmpi-1.8.8/bin/orted: No such file or directory
> >
> --
> > ORTE was unable to reliably start one or more daemons.
> > This usually is caused by:
> >
> > * not finding the required libraries and/or binaries on
> > one or more nodes. Please check your PATH and LD_LIBRARY_PATH
> > settings, or configure OMPI with --enable-orterun-prefix-by-default
> >
> > * lack of authority to execute on one or more specified nodes.
> > Please verify your allocation and authorities.
> >
> > * the inability to write startup files into /tmp
> > (--tmpdir/orte_tmpdir_base).
> > Please check with your sys admin to determine the correct location to
> use.
> >
> > * compilation of the orted with dynamic libraries when static are
> required
> > (e.g., on Cray). Please check your configure cmd line and consider
> using
> > one of the contrib/platform definitions for your system type.
> >
> > * an inability to create a connection back to mpirun due to a
> > lack of common network interfaces and/or no route found between
> > them. Please check network connectivity (including firewalls
> > and network routing requirements).
> >
> >
> >
> > Before update I have version 1.6.4 and the cluster not show errors when I
> > run the mpirun
> >
> > I changed the Enviroment Variables but persist the error.
> >
> > Is possible ypur comment who resolved the issue.
> >
> > Regards
> >
> > Sebastian Antunez
> >
> >
> >
> > ___
> > users mailing list
> > users@lists.open-mpi.org
> > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
> ___
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users