Re: [OMPI users] File-backed mmaped I/O and openib btl.

2014-11-11 Thread Emmanuel Thomé
Thanks a lot for your analysis. This seems consistent with what I can obtain by playing around with my different test cases. It seems that munmap() does *not* unregister the memory chunk from the cache. I suppose this is the reason for the bug. In fact using mmap(..., MAP_ANONYMOUS | MAP_PRIVATE)

Re: [OMPI users] File-backed mmaped I/O and openib btl.

2014-11-11 Thread Joshua Ladd
I was able to reproduce your issue and I think I understand the problem a bit better at least. This demonstrates exactly what I was pointing to: It looks like when the test switches over from eager RDMA (I'll explain in a second), to doing a rendezvous protocol working entirely in user buffer spac

Re: [OMPI users] File-backed mmaped I/O and openib btl.

2014-11-11 Thread Emmanuel Thomé
Hi again, I've been able to simplify my test case significantly. It now runs with 2 nodes, and only a single MPI_Send / MPI_Recv pair is used. The pattern is as follows. * - ranks 0 and 1 both own a local buffer. * - each fills it with (deterministically known) data. * - rank 0 collects th

Re: [OMPI users] File-backed mmaped I/O and openib btl.

2014-11-10 Thread Emmanuel Thomé
Thanks for your answer. On Mon, Nov 10, 2014 at 4:31 PM, Joshua Ladd wrote: > Just really quick off the top of my head, mmaping relies on the virtual > memory subsystem, whereas IB RDMA operations rely on physical memory being > pinned (unswappable.) Yes. Does that mean that the result of comput

Re: [OMPI users] File-backed mmaped I/O and openib btl.

2014-11-10 Thread Joshua Ladd
Just really quick off the top of my head, mmaping relies on the virtual memory subsystem, whereas IB RDMA operations rely on physical memory being pinned (unswappable.) For a large message transfer, the OpenIB BTL will register the user buffer, which will pin the pages and make them unswappable. If

[OMPI users] File-backed mmaped I/O and openib btl.

2014-11-10 Thread Emmanuel Thomé
Hi, I'm stumbling on a problem related to the openib btl in openmpi-1.[78].*, and the (I think legitimate) use of file-backed mmaped areas for receiving data through MPI collective calls. A test case is attached. I've tried to make is reasonably small, although I recognize that it's not extra thi