Cray CS400, RedHat 6.5, PBS Pro (but OpenMPI is built --without-tm), OpenMPI 
1.8.8, ssh

-----Original Message-----
From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Ralph Castain
Sent: Thursday, July 28, 2016 4:07 PM
To: Open MPI Users <users@lists.open-mpi.org>
Subject: EXTERNAL: Re: [OMPI users] Question on run-time error "ORTE was unable 
to reliably start"

What kind of system was this on? ssh, slurm, ...?


> On Jul 28, 2016, at 1:55 PM, Blosch, Edwin L <edwin.l.blo...@lmco.com> wrote:
> 
> I am running cases that are starting just fine and running for a few hours, 
> then they die with a message that seems like a startup type of failure.  
> Message shown below.  The message appears in standard output from rank 0 
> process.  I'm assuming there is a failing card or port or something.
> 
> What diagnostic flags can I add to mpirun to help shed light on the problem?
> 
> What kinds of problems could cause this kind of message, which looks start-up 
> related, after the job has already been running many hours?
> 
> Ed
> 
> ----------------------------------------------------------------------
> ---- ORTE was unable to reliably start one or more daemons.
> This usually is caused by:
> 
> * not finding the required libraries and/or binaries on  one or more 
> nodes. Please check your PATH and LD_LIBRARY_PATH  settings, or 
> configure OMPI with --enable-orterun-prefix-by-default
> 
> * lack of authority to execute on one or more specified nodes.
>  Please verify your allocation and authorities.
> 
> * the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base).
>  Please check with your sys admin to determine the correct location to use.
> 
> *  compilation of the orted with dynamic libraries when static are 
> required  (e.g., on Cray). Please check your configure cmd line and 
> consider using  one of the contrib/platform definitions for your system type.
> 
> * an inability to create a connection back to mpirun due to a  lack of 
> common network interfaces and/or no route found between  them. Please 
> check network connectivity (including firewalls  and network routing 
> requirements).
> ----------------------------------------------------------------------
> --- _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to