Hi, I did not know about shared queues.
It does not run out of memory. ;-) But the latency is not very good. ** Test 1 --mca btl_openib_max_send_size 4096 \ --mca btl_openib_eager_limit 4096 \ --mca btl_openib_rndv_eager_limit 4096 \ --mca btl_openib_receive_queues S,4096,2048,1024,32 \ I get 1.5 milliseconds. => https://gist.github.com/3799889 ** Test 2 --mca btl_openib_receive_queues S,65536,256,128,32 \ I get around 1.5 milliseconds too. => https://gist.github.com/3799940 With my virtual router I am sure I can get something around 270 microseconds. Just out of curiosity, does Open-MPI utilize heavily negative values internally for user-provided MPI tags ? If the negative tags are internal to Open-MPI, my code will not touch these private variables, right ? Sébastien On 28/09/12 08:59 AM, Jeff Squyres wrote: > On Sep 27, 2012, at 7:22 PM, Sébastien Boisvert wrote: > >> Without the virtual message router, I get messages like these: >> >> [cp2558][[30209,1],0][connect/btl_openib_connect_oob.c:490:qp_create_one] >> error creating qp errno says Cannot allocate memory > > You're running out of registered memory. Check out these FAQ items: > > http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages > http://www.open-mpi.org/faq/?category=openfabrics#ib-receive-queues > > The second one tells you how to change your receive queue types; Open MPI > defaults to 1 per-peer receive queue and several shared receive queues. You > might want to change to all shared receive queues. > >> The real message tag, the real source and the real destination are stored >> in the MPI tag. I know, this is ugly, but it works. I can not store this >> information in the message buffer because the buffer can be NULL. >> >> bits 0 to 7: tag (8 bits, values from 0 to 255, 256 possible values) >> bits 8 to 19: true source (12 bits, values from 0 to 4095, 4096 possible >> values) >> bits 20 to 31: true destination (12 bits, values from 0 to 4095, 4096 >> possible values) >> >> Without the virtual router, my code is compliant with the fact that >> MPI_Comm_get_attr(MPI_COMM_WORLD, MPI_TAG_UB,...) is at least 32767 (my tags >> are <= 255). >> >> When I try jobs with 4096 processes with the virtual message router, I get >> the error: >> >> MPI_ERR_TAG: invalid tag. >> >> Without the virtual message router I get: >> >> [cp2558][[30209,1],0][connect/btl_openib_connect_oob.c:490:qp_create_one] >> error creating qp errno says Cannot allocate memory >> >> With Open-MPI 1.5.4, the upper bound is 17438272 (at least in our build). >> That explains MPI_ERR_TAG. > > +1 on what Hristo said -- remember that you get a pointer to an MPI_Aint. So > you need to dereference it to get the value back. > >> My 2 questions: >> >> 1. Is there a better way to store routing information ? > > Seems fine to me. Just stay <=INT_MAX and you should be fine. > >> 2. Can I create my own communicator and set its MPI_TAG_UB to whatever I >> want ? > > As Hristo said, no. It's a limit in Open MPI. >