Re: [OMPI users] Using POSIX shared memory as send buffer

Nathan Hjelm Tue, 29 Sep 2015 10:59:14 -0400 (EDT)

We register the memory with the NIC for both read and write access. This
may be the source of the slowdown. We recently added internal support to
allow the point-to-point layer to specify the access flags but the
openib btl does not yet make use of the new support. I plan to make the
necessary changes before the 2.0.0 release. I should have them complete
later this week. I can send you a note when they are ready if you would
like to try it and see if it addresses the problem.


-Nathan

On Tue, Sep 29, 2015 at 10:51:38AM +0200, Marcin Krotkiewski wrote:
> Thanks, Dave.
> 
> I have verified the memory locality and IB card locality, all's fine.
> 
> Quite accidentally I have found that there is a huge penalty if I mmap the
> shm with PROT_READ only. Using PROT_READ | PROT_WRITE yields good results,
> although I must look at this further. I'll report when I am certain, in case
> sb finds this useful.
> 
> Is this an OS feature, or is OpenMPI somehow working differently? I don't
> suspect you guys write to the send buffer, right? Even if you would there
> would be a segfault. So I guess this could be OS preventing any writes to
> the pointer that introduced the overhead?
> 
> Marcin
> 
> 
> 
> On 09/28/2015 09:44 PM, Dave Goodell (dgoodell) wrote:
> >On Sep 27, 2015, at 1:38 PM, marcin.krotkiewski 
> ><marcin.krotkiew...@gmail.com> wrote:
> >>Hello, everyone
> >>
> >>I am struggling a bit with IB performance when sending data from a POSIX 
> >>shared memory region (/dev/shm). The memory is shared among many MPI 
> >>processes within the same compute node. Essentially, I see a bit hectic 
> >>performance, but it seems that my code it is roughly twice slower than when 
> >>using a usual, malloced send buffer.
> >It may have to do with NUMA effects and the way you're allocating/touching 
> >your shared memory vs. your private (malloced) memory.  If you have a 
> >multi-NUMA-domain system (i.e., any 2+ socket server, and even some 
> >single-socket servers) then you are likely to run into this sort of issue.  
> >The PCI bus on which your IB HCA communicates is almost certainly closer to 
> >one NUMA domain than the others, and performance will usually be worse if 
> >you are sending/receiving from/to a "remote" NUMA domain.
> >
> >"lstopo" and other tools can sometimes help you get a handle on the 
> >situation, though I don't know if it knows how to show memory affinity.  I 
> >think you can find memory affinity for a process via 
> >"/proc/<pid>/numa_maps".  There's lots of info about NUMA affinity here: 
> >https://queue.acm.org/detail.cfm?id=2513149
> >
> >-Dave
> >
> >_______________________________________________
> >users mailing list
> >us...@open-mpi.org
> >Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> >Link to this post: 
> >http://www.open-mpi.org/community/lists/users/2015/09/27702.php
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2015/09/27705.php

pgpKOrC5an6D_.pgp
Description: PGP signature

Re: [OMPI users] Using POSIX shared memory as send buffer

Reply via email to