Hi,

On May 19, 2011, at 9:37 AM, Robert Horton wrote

> On Thu, 2011-05-19 at 08:27 -0600, Samuel K. Gutierrez wrote:
>> Hi,
>> 
>> Try the following QP parameters that only use shared receive queues.
>> 
>> -mca btl_openib_receive_queues S,12288,128,64,32:S,65536,128,64,32
>> 
> 
> Thanks for that. If I run the job over 2 x 48 cores it now works and the
> performance seems reasonable (I need to do some more tuning) but when I
> go up to 4 x 48 cores I'm getting the same problem:
> 
> [compute-1-7.local][[14383,1],86][../../../../../ompi/mca/btl/openib/connect/btl_openib_connect_oob.c:464:qp_create_one]
>  error creating qp errno says Cannot allocate memory
> [compute-1-7.local:18106] *** An error occurred in MPI_Isend
> [compute-1-7.local:18106] *** on communicator MPI_COMM_WORLD
> [compute-1-7.local:18106] *** MPI_ERR_OTHER: known error not in list
> [compute-1-7.local:18106] *** MPI_ERRORS_ARE_FATAL (your MPI job will now 
> abort)
> 
> Any thoughts?

How much memory does each node have?  Does this happen at startup?

Try adding:

-mca btl_openib_cpc_include rdmacm

I'm not sure if your version of OFED supports this feature, but maybe using XRC 
may help.  I **think** other tweaks are needed to get this going, but I'm not 
familiar with the details.

Hope that helps,

Samuel K. Gutierrez
Los Alamos National Laboratory


> 
> Thanks,
> Rob
> -- 
> Robert Horton
> System Administrator (Research Support) - School of Mathematical Sciences
> Queen Mary, University of London
> r.hor...@qmul.ac.uk  -  +44 (0) 20 7882 7345
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




Reply via email to