Just really quick off the top of my head, mmaping relies on the virtual memory subsystem, whereas IB RDMA operations rely on physical memory being pinned (unswappable.) For a large message transfer, the OpenIB BTL will register the user buffer, which will pin the pages and make them unswappable. If the data being transfered is small, you'll copy-in/out to internal bounce buffers and you shouldn't have issues.
1.If you try to just bcast a few kilobytes of data using this technique, do you run into issues? 2. How large is the data in the collective (input and output), is in_place used? I'm guess it's large enough that the BTL tries to work with the user buffer. Josh On Mon, Nov 10, 2014 at 9:29 AM, Emmanuel Thomé <emmanuel.th...@gmail.com> wrote: > Hi, > > I'm stumbling on a problem related to the openib btl in > openmpi-1.[78].*, and the (I think legitimate) use of file-backed > mmaped areas for receiving data through MPI collective calls. > > A test case is attached. I've tried to make is reasonably small, > although I recognize that it's not extra thin. The test case is a > trimmed down version of what I witness in the context of a rather > large program, so there is no claim of relevance of the test case > itself. It's here just to trigger the desired misbehaviour. The test > case contains some detailed information on what is done, and the > experiments I did. > > In a nutshell, the problem is as follows. > > - I do a computation, which involves MPI_Reduce_scatter and MPI_Allgather. > - I save the result to a file (collective operation). > > *If* I save the file using something such as: > fd = open("blah", ... > area = mmap(..., fd, ) > MPI_Gather(..., area, ...) > *AND* the MPI_Reduce_scatter is done with an alternative > implementation (which I believe is correct) > *AND* communication is done through the openib btl, > > then the file which gets saved is inconsistent with what is obtained > with the normal MPI_Reduce_scatter (alghough memory areas do coincide > before the save). > > I tried to dig a bit in the openib internals, but all I've been able > to witness was beyond my expertise (an RDMA read not transferring the > expected data, but I'm too uncomfortable with this layer to say > anything I'm sure about). > > Tests have been done with several openmpi versions including 1.8.3, on > a debian wheezy (7.5) + OFED 2.3 cluster. > > It would be great if someone could tell me if he is able to reproduce > the bug, or tell me whether something which is done in this test case > is illegal in any respect. I'd be glad to provide further information > which could be of any help. > > Best regards, > > E. Thomé. > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/11/25730.php >