Also check to ensure you are using the same version of OMPI on all nodes - this 
message usually means that a different version was used on at least one node.

> On Dec 23, 2016, at 1:58 AM, gil...@rist.or.jp wrote:
> 
>  Serguei,
> 
>  
> this looks like a very different issue, orted cannot be remotely started.
> 
>  
> that typically occurs if orted cannot find some dependencies
> 
> (the Open MPI libs and/or the compiler runtime)
> 
>  
> for example, from a node, ssh <other node> orted should not fail because of 
> unresolved dependencies.
> 
> a simple trick is to replace
> 
> mpirun ...
> 
> with
> 
> `which mpirun` ...
> 
>  
> a better option (as long as you do not plan to relocate Open MPI install dir) 
> is to configure with
> 
> --enable-mpirun-prefix-by-default
> 
>  
> Cheers,
> 
>  
> Gilles
> 
> ----- Original Message -----
> 
> Hi All !
> As there are no any positive changes with "UDSM + IPoIB" problem since my 
> previous post, 
> we installed IPoIB on the cluster and "No OpenFabrics connection..." error 
> doesn't appear more.
> But now OpenMPI reports about another problem:
> 
> In app ERROR OUTPUT stream:
> 
> [node2:14142] [[37935,0],0] ORTE_ERROR_LOG: Data unpack had inadequate space 
> in file base/plm_base_launch_support.c at line 1035
> 
> In app OUTPUT stream:
> 
> --------------------------------------------------------------------------
> ORTE was unable to reliably start one or more daemons.
> This usually is caused by:
> 
> * not finding the required libraries and/or binaries on
>   one or more nodes. Please check your PATH and LD_LIBRARY_PATH
>   settings, or configure OMPI with --enable-orterun-prefix-by-default
> 
> * lack of authority to execute on one or more specified nodes.
>   Please verify your allocation and authorities.
> 
> * the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base).
>   Please check with your sys admin to determine the correct location to use.
> 
> *  compilation of the orted with dynamic libraries when static are required
>   (e.g., on Cray). Please check your configure cmd line and consider using
>   one of the contrib/platform definitions for your system type.
> 
> * an inability to create a connection back to mpirun due to a
>   lack of common network interfaces and/or no route found between
>   them. Please check network connectivity (including firewalls
>   and network routing requirements).
> --------------------------------------------------------------------------
> 
> When I'm trying to run the task using single node - all works properly.
> But when I specify "run on 2 nodes", the problem appears.
> 
> I tried to run ping using IPoIB addresses and all hosts are resolved 
> properly, 
> ping requests and replies are going over IB without any problems.
> So all nodes (including head) see each other via IPoIB.
> But MPI app fails.
> 
> Same test task works perfect on all nodes being run with Ethernet transport 
> instead of InfiniBand.
> 
> P.S. We use Torque resource manager to enqueue MPI tasks.
> 
> Best regards,
> Sergei.
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to