All,

I'm using OpenMPI 1.4.3 and have been running a particular case on 120, 240, 
480 and 960 processes.  My time-per-work metric reports 60, 30, 15, 15.  If I 
do the same run with MVAPICH 1.2, I get 60, 30, 15, 8. There is something 
running very slowly with OpenMPI 1.4.3 as the process count goes from 480 up to 
960.

Also this case has been really troublesome at 960, reliability-wise. Initially, 
the OpenMPI cases would reach a certain point in the application with some 
weird communication patterns, and they would die with the following messages:
c4n01][[14679,1],5][connect/btl_openib_connect_oob.c:464:qp_create_one] error 
creating qp errno says Cannot allocate memory

I then added this parameter:
'--mca btl_openib_receive_queues 
X,128,256,192,128:X,2048,256,128,32:X,12288,256,128,32:X,65536,256,128,32'

and it runs...  but as I said above, it runs 2x slower than MVAPICH.  All of it 
is very repeatable.

How can I determine the source of the problem here?

Thanks for any advice,

Ed





Reply via email to