There was a bug in that patch that affected IB systems. Updated patch: https://github.com/hjelmn/ompi/commit/c53df23c0bcf8d1c531e04d22b96c8c19f9b3fd1.patch
-Nathan On Tue, Sep 29, 2015 at 03:35:21PM -0600, Nathan Hjelm wrote: > > I have a branch with the changes available at: > > https://github.com/hjelmn/ompi.git > > in the mpool_update branch. If you prefer you can apply this patch to > either a 2.x or a master tarball. > > https://github.com/hjelmn/ompi/commit/8839dbfae85ba8f443b2857f9bbefdc36c4ebc1a.patch > > Let me know if this resolves the performance issues. > > -Nathan > > On Tue, Sep 29, 2015 at 09:57:54PM +0200, marcin.krotkiewski wrote: > > I've now run a few more tests and I think I can reasonably confidently > > say > > that the read only mmap is a problem. Let me know if you have a possible > > fix - I will gladly test it. > > > > Marcin > > > > On 09/29/2015 04:59 PM, Nathan Hjelm wrote: > > > > We register the memory with the NIC for both read and write access. This > > may be the source of the slowdown. We recently added internal support to > > allow the point-to-point layer to specify the access flags but the > > openib btl does not yet make use of the new support. I plan to make the > > necessary changes before the 2.0.0 release. I should have them complete > > later this week. I can send you a note when they are ready if you would > > like to try it and see if it addresses the problem. > > > > -Nathan > > > > On Tue, Sep 29, 2015 at 10:51:38AM +0200, Marcin Krotkiewski wrote: > > > > Thanks, Dave. > > > > I have verified the memory locality and IB card locality, all's fine. > > > > Quite accidentally I have found that there is a huge penalty if I mmap the > > shm with PROT_READ only. Using PROT_READ | PROT_WRITE yields good results, > > although I must look at this further. I'll report when I am certain, in > > case > > sb finds this useful. > > > > Is this an OS feature, or is OpenMPI somehow working differently? I don't > > suspect you guys write to the send buffer, right? Even if you would there > > would be a segfault. So I guess this could be OS preventing any writes to > > the pointer that introduced the overhead? > > > > Marcin > > > > > > > > On 09/28/2015 09:44 PM, Dave Goodell (dgoodell) wrote: > > > > On Sep 27, 2015, at 1:38 PM, marcin.krotkiewski > > <marcin.krotkiew...@gmail.com> wrote: > > > > Hello, everyone > > > > I am struggling a bit with IB performance when sending data from a POSIX > > shared memory region (/dev/shm). The memory is shared among many MPI > > processes within the same compute node. Essentially, I see a bit hectic > > performance, but it seems that my code it is roughly twice slower than when > > using a usual, malloced send buffer. > > > > It may have to do with NUMA effects and the way you're allocating/touching > > your shared memory vs. your private (malloced) memory. If you have a > > multi-NUMA-domain system (i.e., any 2+ socket server, and even some > > single-socket servers) then you are likely to run into this sort of issue. > > The PCI bus on which your IB HCA communicates is almost certainly closer to > > one NUMA domain than the others, and performance will usually be worse if > > you are sending/receiving from/to a "remote" NUMA domain. > > > > "lstopo" and other tools can sometimes help you get a handle on the > > situation, though I don't know if it knows how to show memory affinity. I > > think you can find memory affinity for a process via > > "/proc/<pid>/numa_maps". There's lots of info about NUMA affinity here: > > https://queue.acm.org/detail.cfm?id=2513149 > > > > -Dave > > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > > Link to this post: > > http://www.open-mpi.org/community/lists/users/2015/09/27702.php > > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > > Link to this post: > > http://www.open-mpi.org/community/lists/users/2015/09/27705.php > > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > > Link to this post: > > http://www.open-mpi.org/community/lists/users/2015/09/27711.php > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > > Link to this post: > > http://www.open-mpi.org/community/lists/users/2015/09/27716.php > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/09/27717.php
pgp3ozTB9aMHO.pgp
Description: PGP signature