---------- Forwarded message ---------- From: Jeff Squyres (jsquyres) <jsquy...@cisco.com> List-Post: users@lists.open-mpi.org Date: Fri, Mar 21, 2014 at 3:05 PM Subject: Re: problem for multiple clusters using mpirun To: Hamid Saeed <e.hamidsa...@gmail.com>
Please reply on the mailing list; more people can reply that way, and the answers to your questions become google-able for people with similar questions. On Mar 21, 2014, at 10:03 AM, Hamid Saeed <e.hamidsa...@gmail.com> wrote: > Hello Jeff, > > Sorry to bother you again. > > I think i have a tcp connection. As for as i know my cluster is not configured for Infiniband (IB). > > but even for tcp connections. > > mpirun -n 2 -host master,node001 --mca btl tcp,sm,self ./helloworldmpi > mpirun -n 2 -host master,node001 ./helloworldmpi > > These line are not working they output > Error like > [btl_tcp_endpoint.c:655:mca_btl_tcp_endpoint_complete_connect] connect() to xx.xxx.x.xxx failed: Connection refused (111) > > > at the program hangs up until i press > ctrl + c. > n Fri, Mar 21, 2014 at 2:47 PM, Hamid Saeed <e.hamidsa...@gmail.com> wrote: > > Hello, > > Thanks for the answer. > > Can you kindly explain what does IB connection means? > > thanks > > regards > > > > On Fri, Mar 21, 2014 at 2:44 PM, Jeff Squyres (jsquyres) < jsquy...@cisco.com> wrote: > Was Ralph's answer not enough? I think he hit the nail on the head... > > > On Mar 21, 2014, at 9:29 AM, Hamid Saeed <e.hamidsa...@gmail.com> wrote: > > > Hello: > > > > I have learnt about mpi from you using different web portals. > > I hope you can help me in solving this problem too. > > > > * I can compile my helloworld.c program using mpicc and I have confirmed that the script runs correctly on another working cluster, so the local paths are set up correctly I think and the script definitely works. > > > > * If I execute mpirun from my master node, and using only the master node, helloworld executes correctly: > > > > mpirun -n 1 -host master --mca btl sm,openib,self ./helloworldmpi > > hello world from process 0 of 1 > > > > * If I execute mpirun from my master node, using only the worker node, helloworld executes correctly: > > > > mpirun -n 1 -host node001 --mca btl sm,openib,self./helloworldmpi > > hello world from process 0 of 1 > > > > Now, my problem is that if I try to run helloworld on both nodes, I get an error: > > > > mpirun -n 2 -host master,node001 --mca btl openib,self ./helloworldmpi > > -------------------------------------------------------------------------- > > At least one pair of MPI processes are unable to reach each other for > > MPI communications. This means that no Open MPI device has indicated > > that it can be used to communicate between these processes. This is > > an error; Open MPI requires that all MPI processes be able to reach > > each other. This error can sometimes be the result of forgetting to > > specify the "self" BTL. > > > > Process 1 ([[5228,1],0]) is on host: hsaeed > > Process 2 ([[5228,1],1]) is on host: node001 > > BTLs attempted: self > > > > Your MPI job is now going to abort; sorry. > > -------------------------------------------------------------------------- > > -------------------------------------------------------------------------- > > It looks like MPI_INIT failed for some reason; your parallel process is > > likely to abort. There are many reasons that a parallel process can > > fail during MPI_INIT; some of which are due to configuration or environment > > problems. This failure appears to be an internal failure; here's some > > additional information (which may only be relevant to an Open MPI > > developer): > > > > PML add procs failed > > --> Returned "Unreachable" (-12) instead of "Success" (0) > > -------------------------------------------------------------------------- > > *** The MPI_Init() function was called before MPI_INIT was invoked. > > *** This is disallowed by the MPI standard. > > *** Your MPI job will now abort. > > Abort before MPI_INIT completed successfully; not able to guarantee that all other processes were killed! > > -------------------------------------------------------------------------- > > mpirun has exited due to process rank 0 with PID 7037 on > > node xxxx exiting without calling "finalize". This may > > have caused other processes in the application to be > > terminated by signals sent by mpirun (as reported here). > > -------------------------------------------------------------------------- > > *** The MPI_Init() function was called before MPI_INIT was invoked. > > *** This is disallowed by the MPI standard. > > *** Your MPI job will now abort. > > Abort before MPI_INIT completed successfully; not able to guarantee that all other processes were killed! > > 1 more process has sent help message help-mca-bml-r2.txt / unreachable proc > > Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages > > 1 more process has sent help message help-mpi-runtime > > > > > > i tried using > > mpirun -n 2 -host master,node001 --mca btl tcp,sm,self ./helloworldmpi > > mpirun -n 2 -host master,node001 --mca btl o > > > > > > penib,tcp, > > self ./helloworldmpi > > etc.. > > > > But no flag is works. > > > > > > Can some one reply with the idea. > > > > Thanks in Advance. > > > > Regards-- > > -- > > _______________________________________________ > > Hamid Saeed > > _______________________________________________ > > > > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/ > > > > > > -- > -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/ -- _______________________________________________ Hamid Saeed CoSynth GmbH & Co. KG Escherweg 2 - 26121 Oldenburg - Germany Tel +49 441 9722 738 | Fax -278 http://www.cosynth.com _______________________________________________