Cray CS400, RedHat 6.5, PBS Pro (but OpenMPI is built --without-tm), OpenMPI 1.8.8, ssh
-----Original Message----- From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Ralph Castain Sent: Thursday, July 28, 2016 4:07 PM To: Open MPI Users <users@lists.open-mpi.org> Subject: EXTERNAL: Re: [OMPI users] Question on run-time error "ORTE was unable to reliably start" What kind of system was this on? ssh, slurm, ...? > On Jul 28, 2016, at 1:55 PM, Blosch, Edwin L <edwin.l.blo...@lmco.com> wrote: > > I am running cases that are starting just fine and running for a few hours, > then they die with a message that seems like a startup type of failure. > Message shown below. The message appears in standard output from rank 0 > process. I'm assuming there is a failing card or port or something. > > What diagnostic flags can I add to mpirun to help shed light on the problem? > > What kinds of problems could cause this kind of message, which looks start-up > related, after the job has already been running many hours? > > Ed > > ---------------------------------------------------------------------- > ---- ORTE was unable to reliably start one or more daemons. > This usually is caused by: > > * not finding the required libraries and/or binaries on one or more > nodes. Please check your PATH and LD_LIBRARY_PATH settings, or > configure OMPI with --enable-orterun-prefix-by-default > > * lack of authority to execute on one or more specified nodes. > Please verify your allocation and authorities. > > * the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base). > Please check with your sys admin to determine the correct location to use. > > * compilation of the orted with dynamic libraries when static are > required (e.g., on Cray). Please check your configure cmd line and > consider using one of the contrib/platform definitions for your system type. > > * an inability to create a connection back to mpirun due to a lack of > common network interfaces and/or no route found between them. Please > check network connectivity (including firewalls and network routing > requirements). > ---------------------------------------------------------------------- > --- _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/users _______________________________________________ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users _______________________________________________ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users