Looks like you don't have an IB connection between "master" and "node001"

On Mar 21, 2014, at 12:43 AM, Hamid Saeed <e.hamidsa...@gmail.com> wrote:

> Hello All:
> 
> I know there will be some one who can help me in solving this problem.
> 
> I can compile my helloworld.c program using mpicc and I have confirmed that 
> the script runs correctly on another working cluster, so the local paths are 
> set up correctly I think and the script definitely works.
> 
> If I execute mpirun from my master node, and using only the master node, 
> helloworld executes correctly:
> 
> mpirun -n 1 -host master --mca btl sm,openib,self ./helloworldmpi
> hello world from process 0 of 1
> If I execute mpirun from my master node, using only the worker node, 
> helloworld executes correctly:
> 
> mpirun -n 1 -host node001 --mca btl sm,openib,self./helloworldmpi
> hello world from process 0 of 1
> Now, my problem is that if I try to run helloworld on both nodes, I get an 
> error:
> 
> mpirun -n 2 -host master,node001 --mca btl openib,self ./helloworldmpi
> --------------------------------------------------------------------------
> At least one pair of MPI processes are unable to reach each other for
> MPI communications.  This means that no Open MPI device has indicated
> that it can be used to communicate between these processes.  This is
> an error; Open MPI requires that all MPI processes be able to reach
> each other.  This error can sometimes be the result of forgetting to
> specify the "self" BTL.
> 
>   Process 1 ([[5228,1],0]) is on host: hsaeed
>   Process 2 ([[5228,1],1]) is on host: node001
>   BTLs attempted: self
> 
> Your MPI job is now going to abort; sorry.
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> It looks like MPI_INIT failed for some reason; your parallel process is
> likely to abort.  There are many reasons that a parallel process can
> fail during MPI_INIT; some of which are due to configuration or environment
> problems.  This failure appears to be an internal failure; here's some
> additional information (which may only be relevant to an Open MPI
> developer):
> 
>   PML add procs failed
>   --> Returned "Unreachable" (-12) instead of "Success" (0)
> --------------------------------------------------------------------------
> *** The MPI_Init() function was called before MPI_INIT was invoked.
> *** This is disallowed by the MPI standard.
> *** Your MPI job will now abort.
> Abort before MPI_INIT completed successfully; not able to guarantee that all 
> other processes were killed!
> --------------------------------------------------------------------------
> mpirun has exited due to process rank 0 with PID 7037 on
> node xxxx exiting without calling "finalize". This may
> have caused other processes in the application to be
> terminated by signals sent by mpirun (as reported here).
> --------------------------------------------------------------------------
> *** The MPI_Init() function was called before MPI_INIT was invoked.
> *** This is disallowed by the MPI standard.
> *** Your MPI job will now abort.
> Abort before MPI_INIT completed successfully; not able to guarantee that all 
> other processes were killed!
> 1 more process has sent help message help-mca-bml-r2.txt / unreachable proc
> Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error 
> messages
> 1 more process has sent help message help-mpi-runtime
> 
> i tried using
> mpirun -n 2 -host master,node001 --mca btl tcp,sm,self ./helloworldmpi
> mpirun -n 2 -host master,node001 --mca btl o
> penib,tcp,self ./helloworldmpi
> etc..
> 
> But no flag is works.
> 
> 
> Can some one reply with the idea.
> 
> Thanks in Advance.
> 
> Regards--
> -- 
> _______________________________________________
> Hamid Saeed
> _______________________________________________
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to