I have discovered slightly more information:When I replace node 'B' from the new cluster with node 'C' from the old clusterI get the similar behavior but with an error message:mpirun -H A,A,A,A,A,A,A ring (works from either node) mpirun -H C,C,C ring (works from either node) mpirun -H A,C ring (Fails from either node:)Process 0 sending 10 to 1, tag 201 (3 processes in ring) [C:23465] *** An error occurred in MPI_Recv [C:23465] *** on communicator MPI_COMM_WORLD[C:23465] *** MPI_ERRORS_ARE FATAL (your job will now abort)Process 0 sent to 1----------------------------------Running this on either node A or C produces the same resultNode C runs openMPI 1.4.1 and is an ordinary dual core on FC10 , not an i5 2400 like the others.all the binaries are compiled on FC10 with gcc 4.3.2 --- On Tue, 12/7/11, Randolph Pullen <randolph_pul...@yahoo.com.au> wrote:
From: Randolph Pullen <randolph_pul...@yahoo.com.au> Subject: Re: [OMPI users] Mpirun only works when n< 3 To: "Open MPI Users" <us...@open-mpi.org>, "Jeff Squyres" <jsquy...@cisco.com> Received: Tuesday, 12 July, 2011, 1:31 AM There are no firewalls by default. I can ssh between both nodes without a password so I assumed that all is good with the comms.I can also get both nodes to participate in the ring program at the same time.Its just that I am limited to inly 2 processes if they are split between the nodes ie:mpirun -H A,B ring (works)mpirun -H A,A,A,A,A,A,A ring (works)mpirun -H B,B,B,B ring (works)mpirun -H A,B,A ring (hangs) --- On Tue, 12/7/11, Jeff Squyres <jsquy...@cisco.com> wrote: From: Jeff Squyres <jsquy...@cisco.com> Subject: Re: [OMPI users] Mpirun only works when n< 3 To: randolph_pul...@yahoo.com.au, "Open MPI Users" <us...@open-mpi.org> Received: Tuesday, 12 July, 2011, 12:21 AM Have you disabled firewalls between your compute nodes? On Jul 11, 2011, at 9:34 AM, Randolph Pullen wrote: > This appears to be similar to the problem described in: > > https://svn.open-mpi.org/trac/ompi/ticket/2043 > > However, those fixes do not work for me. > > I am running on an > > - i5 sandy bridge under Ubuntu 10.10 8 G RAM > > - Kernel 2.6.32.14 with OpenVZ tweaks > > - OpenMPI V 1.4.1 > > I am trying to migrate existing software to a new cluster (A,B) > > Symptoms: > > I can run the ring demo on a single machine, either A or B with any number of > processes. > > But when I combine the 2 machines I am limited to 2 processes, any more and > MPI hangs. It gets as far as: > > Process 0 sending 10 to 1, tag 201 (3 processes in ring) > > Process 0 sent to 1 > > and there it stays... > > Any help greatly appreciated. > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/ -----Inline Attachment Follows----- _______________________________________________ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users