Gretchen, Could you please send stack-trace of the processes when it hangs? (with padb/gdb) Does the same problem persist in small scale (2,3 nodes)? What is the minimal setup that reproduces the problem?
-- YK > > ---------- Forwarded message ---------- > From: *Gretchen* <umassastroh...@gmail.com <mailto:umassastroh...@gmail.com>> > Date: Mon, Mar 28, 2011 at 8:35 PM > Subject: Re: [OMPI users] gadget2 infiniband openmpi hang > To: us...@open-mpi.org <mailto:us...@open-mpi.org> > > > The gadget code hangs at the same spot (i.e. number of steps completed AND > same section of code) when I run with --mca btl_openib_cpc_include rdmacm > (code is doing MPI_Sendrecv). > Thanks, > Gretchen > > > Date: Thu, 17 Mar 2011 12:45:32 -0400 > From: Jeff Squyres <jsquy...@cisco.com <mailto:jsquy...@cisco.com>> > Subject: Re: [OMPI users] gadget2 infiniband openmpi hang > To: Open MPI Users <us...@open-mpi.org <mailto:us...@open-mpi.org>> > Message-ID: <c03801dd-a057-4544-a365-f24836879...@cisco.com > <mailto:c03801dd-a057-4544-a365-f24836879...@cisco.com>> > Content-Type: text/plain; charset=us-ascii > > Are you able to run if you use --mca btl_openib_cpc_include rdmacm ? > > > _______________________________________________ > users mailing list > us...@open-mpi.org <mailto:us...@open-mpi.org> > http://www.open-mpi.org/mailman/listinfo.cgi/users >