Looks like you don't have an IB connection between "master" and "node001"
On Mar 21, 2014, at 12:43 AM, Hamid Saeed <e.hamidsa...@gmail.com> wrote: > Hello All: > > I know there will be some one who can help me in solving this problem. > > I can compile my helloworld.c program using mpicc and I have confirmed that > the script runs correctly on another working cluster, so the local paths are > set up correctly I think and the script definitely works. > > If I execute mpirun from my master node, and using only the master node, > helloworld executes correctly: > > mpirun -n 1 -host master --mca btl sm,openib,self ./helloworldmpi > hello world from process 0 of 1 > If I execute mpirun from my master node, using only the worker node, > helloworld executes correctly: > > mpirun -n 1 -host node001 --mca btl sm,openib,self./helloworldmpi > hello world from process 0 of 1 > Now, my problem is that if I try to run helloworld on both nodes, I get an > error: > > mpirun -n 2 -host master,node001 --mca btl openib,self ./helloworldmpi > -------------------------------------------------------------------------- > At least one pair of MPI processes are unable to reach each other for > MPI communications. This means that no Open MPI device has indicated > that it can be used to communicate between these processes. This is > an error; Open MPI requires that all MPI processes be able to reach > each other. This error can sometimes be the result of forgetting to > specify the "self" BTL. > > Process 1 ([[5228,1],0]) is on host: hsaeed > Process 2 ([[5228,1],1]) is on host: node001 > BTLs attempted: self > > Your MPI job is now going to abort; sorry. > -------------------------------------------------------------------------- > -------------------------------------------------------------------------- > It looks like MPI_INIT failed for some reason; your parallel process is > likely to abort. There are many reasons that a parallel process can > fail during MPI_INIT; some of which are due to configuration or environment > problems. This failure appears to be an internal failure; here's some > additional information (which may only be relevant to an Open MPI > developer): > > PML add procs failed > --> Returned "Unreachable" (-12) instead of "Success" (0) > -------------------------------------------------------------------------- > *** The MPI_Init() function was called before MPI_INIT was invoked. > *** This is disallowed by the MPI standard. > *** Your MPI job will now abort. > Abort before MPI_INIT completed successfully; not able to guarantee that all > other processes were killed! > -------------------------------------------------------------------------- > mpirun has exited due to process rank 0 with PID 7037 on > node xxxx exiting without calling "finalize". This may > have caused other processes in the application to be > terminated by signals sent by mpirun (as reported here). > -------------------------------------------------------------------------- > *** The MPI_Init() function was called before MPI_INIT was invoked. > *** This is disallowed by the MPI standard. > *** Your MPI job will now abort. > Abort before MPI_INIT completed successfully; not able to guarantee that all > other processes were killed! > 1 more process has sent help message help-mca-bml-r2.txt / unreachable proc > Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error > messages > 1 more process has sent help message help-mpi-runtime > > i tried using > mpirun -n 2 -host master,node001 --mca btl tcp,sm,self ./helloworldmpi > mpirun -n 2 -host master,node001 --mca btl o > penib,tcp,self ./helloworldmpi > etc.. > > But no flag is works. > > > Can some one reply with the idea. > > Thanks in Advance. > > Regards-- > -- > _______________________________________________ > Hamid Saeed > _______________________________________________ > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users