Hi,

I did not know about shared queues.

It does not run out of memory. ;-)

But the latency is not very good.


** Test 1

--mca btl_openib_max_send_size 4096 \
--mca btl_openib_eager_limit 4096 \
--mca btl_openib_rndv_eager_limit 4096 \
--mca btl_openib_receive_queues S,4096,2048,1024,32 \

I get 1.5 milliseconds.

  => https://gist.github.com/3799889


** Test 2

--mca btl_openib_receive_queues S,65536,256,128,32 \

I get around 1.5 milliseconds too.

  => https://gist.github.com/3799940


With my virtual router I am sure I can get something around 270 microseconds.


Just out of curiosity, does Open-MPI utilize heavily negative values
internally for user-provided MPI tags ?

If the negative tags are internal to Open-MPI, my code will not touch
these private variables, right ?

Sébastien

On 28/09/12 08:59 AM, Jeff Squyres wrote:
> On Sep 27, 2012, at 7:22 PM, Sébastien Boisvert wrote:
> 
>> Without the virtual message router, I get messages like these:
>>
>> [cp2558][[30209,1],0][connect/btl_openib_connect_oob.c:490:qp_create_one] 
>> error creating qp errno says Cannot allocate memory
> 
> You're running out of registered memory.  Check out these FAQ items:
> 
> http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages
> http://www.open-mpi.org/faq/?category=openfabrics#ib-receive-queues
> 
> The second one tells you how to change your receive queue types; Open MPI 
> defaults to 1 per-peer receive queue and several shared receive queues.  You 
> might want to change to all shared receive queues.
> 
>> The real message tag, the real source and the real destination are stored
>> in the MPI tag. I know, this is ugly, but it works. I can not store this
>> information in the message buffer because the buffer can be NULL.
>>
>> bits 0 to 7: tag (8 bits, values from 0 to 255, 256 possible values)
>> bits 8 to 19: true source (12 bits, values from 0 to 4095, 4096 possible 
>> values)
>> bits 20 to 31: true destination (12 bits, values from 0 to 4095, 4096 
>> possible values)
>>
>> Without the virtual router, my code is compliant with the fact that 
>> MPI_Comm_get_attr(MPI_COMM_WORLD, MPI_TAG_UB,...) is at least 32767 (my tags 
>> are <= 255).
>>
>> When I try jobs with 4096 processes with the virtual message router, I get 
>> the error:
>>
>>    MPI_ERR_TAG: invalid tag.
>>
>> Without the virtual message router I get:
>>
>> [cp2558][[30209,1],0][connect/btl_openib_connect_oob.c:490:qp_create_one] 
>> error creating qp errno says Cannot allocate memory
>>
>> With Open-MPI 1.5.4, the upper bound is 17438272 (at least in our build). 
>> That explains MPI_ERR_TAG.
> 
> +1 on what Hristo said -- remember that you get a pointer to an MPI_Aint.  So 
> you need to dereference it to get the value back.
> 
>> My 2 questions:
>>
>> 1. Is there a better way to store routing information ?
> 
> Seems fine to me.  Just stay <=INT_MAX and you should be fine.
> 
>> 2. Can I create my own communicator and set its MPI_TAG_UB to whatever I 
>> want ?
> 
> As Hristo said, no.  It's a limit in Open MPI.
> 

Reply via email to