Thank-you Jeff.  I re-ran IMB (a 64-core run, distributed across a number of 
nodes) under different mca parameters.  Here are the results using OpenMPI 
1.6.5:

1. --mca btl openib,sm,self --mca btl_openib_receive_queues 
X,9216,256,128,32:X,65536,256,128,32
        IMB did not hang.  Consumed 9263 sec (aggregate) CPU time and 8986 MB 
memory

2. --mca btl openib,sm,self --mca btl_openib_receive_queues 
X,128,256,192,128:X,2048,256,128,32:X,12288,256,128,32:X,65536,256,128,32
        IMB hung on Bcast benchmark on a 64-process run, with message size of 
64 bytes

3. --mca btl openib,sm,self
        IMB did not hang.  Consumed 9360 sec (aggregate) CPU time and 9360 MB 
memory

4. --mca btl openib,tcp,self
        IMB did not hang.  Consumed 41911 sec (aggregate) CPU time and 9239 MB 
memory

I did not try OpenMPI 1.8.1 since I am restricted to 1.6.5 at this time, but 
I'm doing a build of 1.8.1 now to test out.  BTW, the release notes refer to 
1.8.2 but the site only has 1.8.1 available for download.

I am a bit concerned, however, with my prior runs hanging.  First, I was unable 
to discern why IMB was hanging so any advice/guidance would be greatly 
appreciated.  I tried doing an strace on an MPI process but no helpful info.   

Second, the motivation behind using XRC was to cut down on memory demands 
w.r.t. the RC QPs.   I'd like to get this working, unless someone can elaborate 
on the negative aspects of using XRC instead of RC QPs.  Thanks!

--john


-----Original Message-----
From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Jeff Squyres 
(jsquyres)
Sent: Wednesday, April 23, 2014 11:19 AM
To: Open MPI Users
Subject: Re: [OMPI users] IMB Sendrecv hangs with OpenMPI 1.6.5 and XRC

A few suggestions:

- Try using Open MPI 1.8.1.  It's the newest release, and has many improvements 
since the 1.6.x series.

- Try using "--mca btl openib,sm,self" (in both v1.6.x and v1.8.x).  This 
allows Open MPI to use shared memory to communicate between processes on the 
same server, which can be a significant performance improvement over TCP or 
even IB.



On Apr 23, 2014, at 11:10 AM, "Sasso, John (GE Power & Water, Non-GE)" 
<john1.sa...@ge.com> wrote:

> I am running IMB (Intel MPI Benchmarks), the MPI-1 benchmarks, which was 
> built with Intel 12.1 compiler suite and OpenMPI 1.6.5 (and running w/ OMPI 
> 1.6.5).  I decided to use the following for the mca parameters:
>  
> --mca btl openib,tcp,self --mca btl_openib_receive_queues 
> X,9216,256,128,32:X,65536,256,128,32
>  
> where before, I always used "--mca btl openib,tcp,self".  This is for 
> performance analysis.  On the SendRecv benchmark at 32 processes, IMB hangs.  
> I then tried:
>  
> --mca btl_openib_receive_queues 
> X,128,256,192,128:X,2048,256,128,32:X,12288,256,128,32:X,65536,256,128,32
>  
> and IMB also hangs on the SendRecv benchmark, though at 64 processes.
>  
> No errors have been recorded, not even in any system log files but 'top' 
> shows the MPI tasks running.  How can I go about troubleshooting this hang, 
> as well as figuring out what (If any) MCA XRC-related parameters in 
> btl_openib_receive_queues I have to specify to get IMB running properly?   I 
> did verify the IB cards are ConnectX.
>  
> --john
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to