Just really quick off the top of my head, mmaping relies on the virtual
memory subsystem, whereas IB RDMA operations rely on physical memory being
pinned (unswappable.) For a large message transfer, the OpenIB BTL will
register the user buffer, which will pin the pages and make them
unswappable. If the data being transfered is small, you'll copy-in/out to
internal bounce buffers and you shouldn't have issues.

1.If you try to just bcast a few kilobytes of data using this technique, do
you run into issues?

2. How large is the data in the collective (input and output), is in_place
used? I'm guess it's large enough that the BTL tries to work with the user
buffer.

Josh

On Mon, Nov 10, 2014 at 9:29 AM, Emmanuel Thomé <emmanuel.th...@gmail.com>
wrote:

> Hi,
>
> I'm stumbling on a problem related to the openib btl in
> openmpi-1.[78].*, and the (I think legitimate) use of file-backed
> mmaped areas for receiving data through MPI collective calls.
>
> A test case is attached. I've tried to make is reasonably small,
> although I recognize that it's not extra thin. The test case is a
> trimmed down version of what I witness in the context of a rather
> large program, so there is no claim of relevance of the test case
> itself. It's here just to trigger the desired misbehaviour. The test
> case contains some detailed information on what is done, and the
> experiments I did.
>
> In a nutshell, the problem is as follows.
>
>  - I do a computation, which involves MPI_Reduce_scatter and MPI_Allgather.
>  - I save the result to a file (collective operation).
>
> *If* I save the file using something such as:
>  fd = open("blah", ...
>  area = mmap(..., fd, )
>  MPI_Gather(..., area, ...)
> *AND* the MPI_Reduce_scatter is done with an alternative
> implementation (which I believe is correct)
> *AND* communication is done through the openib btl,
>
> then the file which gets saved is inconsistent with what is obtained
> with the normal MPI_Reduce_scatter (alghough memory areas do coincide
> before the save).
>
> I tried to dig a bit in the openib internals, but all I've been able
> to witness was beyond my expertise (an RDMA read not transferring the
> expected data, but I'm too uncomfortable with this layer to say
> anything I'm sure about).
>
> Tests have been done with several openmpi versions including 1.8.3, on
> a debian wheezy (7.5) + OFED 2.3 cluster.
>
> It would be great if someone could tell me if he is able to reproduce
> the bug, or tell me whether something which is done in this test case
> is illegal in any respect. I'd be glad to provide further information
> which could be of any help.
>
> Best regards,
>
> E. Thomé.
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2014/11/25730.php
>

Reply via email to