Today's Topics:

   1. Re: btl_openib_connect_oob.c:459:qp_create_one:   errorcreating
      qp (Jeff Squyres)
   2. Re: [OMPI users]
      btl_openib_connect_oob.c:459:qp_create_one:errorcreating qp
      (Jeff Squyres)



------------------------------ Message: 2 Date: Wed, 1 Jul 2009 08:56:50 -0400 From: Jeff Squyres <jsquy...@cisco.com> Subject: Re: [OMPI users] btl_openib_connect_oob.c:459:qp_create_one:errorcreating qp To: "Open MPI Users" <us...@open-mpi.org> Message-ID: <ddc91a3f-aed7-4244-8fa4-a00d4a345...@cisco.com> Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes On Jul 1, 2009, at 8:01 AM, Jeff Squyres (jsquyres) wrote:


Thanks for the reply,



>> > >[n100501][[40339,1],6][../../../../../ompi/mca/btl/openib/connect/
>> > > btl_openib_connect_oob.c:459:qp_create_one]
>> > > error creating qp errno says Cannot allocate memory


> What kind of communication pattern does the application use?  Does it
> use all-to-all?
I narrowed the location of the error down a bit. The application calculates gravitational interaction between particles based on a tree algorithm. The error is thrown in a loop over all levels, ie number of tasks. Inside the loop each task potentially communicates via a single call to MPI_Sendrecev, something like:

for(level = 0; level < nTasks; level++) {
   sendTask = ThisTask
   recvTask = ThisTask ^ level

   if (need_to_exchange_data()) {
        MPI_Sendrecv(buf1, count1, MPI_BYTE, recvTask, tag,
buf2, count2, MPI_BYTE, sendTask, tag,
                     MPI_COMM_WORLD), &status);
   }
}

Message sizes can be anything between 5KB and a couple of MB.
Typically, the error appears around level>=1030 (out of 2048).

>Open MPI makes OpenFabrics verbs (i.e., IB in your
>case) connections lazily, so you might not see these problems until
>OMPI is trying to make connections -- well past MPI_INIT -- and then
>failing when it runs out of HCA QP resources.

>> > > The cluster uses InfiniBand connections. I am aware only of the
>> > > following parameter changes (systemwide):
>> > > btl_openib_ib_min_rnr_timer = 25
>> > > btl_openib_ib_timeout = 20
>> > >
>> > > $> ulimit -l
>> > > unlimited
>> > >
>> > >
>> > > I attached:
>> > > 1) $> ompi_info --all > ompi_info.log
>> > > 2) stderr from the PBS: stderr.log
> >

>Open MPI v1.3 is actually quite flexible in how it creates and uses
>OpenFabrics QPs.  By default, it likely creates 4 QPs (of varying
>buffer sizes) between each pair of MPI processes that connect to each
>other.  You can tune this down to be 3, 2, or even 1 QP to reduce the
>number of QPs that are being opened (and therefore, presumably, not
>exhaust HCA QP resources).

>Alternatively / additionally, you may wish to enable XRC if you have
>recent enough Mellanox HCAs.  This should also save on QP resources.

>You can set both of these things via the mca_btl_openib_receive_queues
>MCA parameter.  It takes a colon-delimited list of receive queues
>(which directly imply how many QP's to create).  There are 3 kinds of
>entries: per-peer QPs, shared receive queues, and XRC receive queues.
>Here's a description of each:

I played around with the number of queues, number of buffers, and buffer size, but nothing really helped. The default is:

$ ompi_info --param btl openib --parsable | grep receive_queues

mca:btl:openib:param:btl_openib_receive_queues:value:
P,128,256,192,128:S,2048,256,128,32:S,12288,256,128,32:S,65536,256,128,32

I thought that running with
$ mpirun -np 2048 -mca mca_btl_openib_receive_queues
  P,128,3000:S,2048,3000:S,12288,3000:S,65536,3000

would do the trick, but it doesn't.

Any other idea?

> Hope this helps!
Yes, At least I understand the problem now ;-)


Cheers,
Jose

Reply via email to