Today's Topics:
1. Re: btl_openib_connect_oob.c:459:qp_create_one: errorcreating
qp (Jeff Squyres)
2. Re: [OMPI users]
btl_openib_connect_oob.c:459:qp_create_one:errorcreating qp
(Jeff Squyres)
------------------------------ Message: 2 Date: Wed, 1 Jul 2009 08:56:50
-0400 From: Jeff Squyres <jsquy...@cisco.com> Subject: Re: [OMPI users]
btl_openib_connect_oob.c:459:qp_create_one:errorcreating qp To: "Open
MPI Users" <us...@open-mpi.org> Message-ID:
<ddc91a3f-aed7-4244-8fa4-a00d4a345...@cisco.com> Content-Type:
text/plain; charset=US-ASCII; format=flowed; delsp=yes On Jul 1, 2009,
at 8:01 AM, Jeff Squyres (jsquyres) wrote:
Thanks for the reply,
>> > >[n100501][[40339,1],6][../../../../../ompi/mca/btl/openib/connect/
>> > > btl_openib_connect_oob.c:459:qp_create_one]
>> > > error creating qp errno says Cannot allocate memory
> What kind of communication pattern does the application use? Does it
> use all-to-all?
I narrowed the location of the error down a bit. The application
calculates gravitational interaction between particles based on a tree
algorithm. The error is thrown in a loop over all levels, ie number of
tasks. Inside the loop each task potentially communicates via a single
call to MPI_Sendrecev, something like:
for(level = 0; level < nTasks; level++) {
sendTask = ThisTask
recvTask = ThisTask ^ level
if (need_to_exchange_data()) {
MPI_Sendrecv(buf1, count1, MPI_BYTE, recvTask, tag,
buf2, count2, MPI_BYTE, sendTask, tag,
MPI_COMM_WORLD), &status);
}
}
Message sizes can be anything between 5KB and a couple of MB.
Typically, the error appears around level>=1030 (out of 2048).
>Open MPI makes OpenFabrics verbs (i.e., IB in your
>case) connections lazily, so you might not see these problems until
>OMPI is trying to make connections -- well past MPI_INIT -- and then
>failing when it runs out of HCA QP resources.
>> > > The cluster uses InfiniBand connections. I am aware only of the
>> > > following parameter changes (systemwide):
>> > > btl_openib_ib_min_rnr_timer = 25
>> > > btl_openib_ib_timeout = 20
>> > >
>> > > $> ulimit -l
>> > > unlimited
>> > >
>> > >
>> > > I attached:
>> > > 1) $> ompi_info --all > ompi_info.log
>> > > 2) stderr from the PBS: stderr.log
> >
>Open MPI v1.3 is actually quite flexible in how it creates and uses
>OpenFabrics QPs. By default, it likely creates 4 QPs (of varying
>buffer sizes) between each pair of MPI processes that connect to each
>other. You can tune this down to be 3, 2, or even 1 QP to reduce the
>number of QPs that are being opened (and therefore, presumably, not
>exhaust HCA QP resources).
>Alternatively / additionally, you may wish to enable XRC if you have
>recent enough Mellanox HCAs. This should also save on QP resources.
>You can set both of these things via the mca_btl_openib_receive_queues
>MCA parameter. It takes a colon-delimited list of receive queues
>(which directly imply how many QP's to create). There are 3 kinds of
>entries: per-peer QPs, shared receive queues, and XRC receive queues.
>Here's a description of each:
I played around with the number of queues, number of buffers, and buffer
size, but nothing really helped. The default is:
$ ompi_info --param btl openib --parsable | grep receive_queues
mca:btl:openib:param:btl_openib_receive_queues:value:
P,128,256,192,128:S,2048,256,128,32:S,12288,256,128,32:S,65536,256,128,32
I thought that running with
$ mpirun -np 2048 -mca mca_btl_openib_receive_queues
P,128,3000:S,2048,3000:S,12288,3000:S,65536,3000
would do the trick, but it doesn't.
Any other idea?
> Hope this helps!
Yes, At least I understand the problem now ;-)
Cheers,
Jose