Greetings Martin. Such approaches have been discussed in the past. Indeed, I'm pretty sure that I've heard of some non-commodity systems / network stacks that do this kind of thing.
Such approaches have not evolved in the commodity Linux space, however. This kind of support would need better hooks than what exist today; new hooks would be needed for integration between the memory allocator (e.g., all the allocation methods in glibc) and the underlying network stack(s). E.g.: - hook when memory is attached to the process - hook when memory is detached from the process - allow multiple hooks to co-exist in the same userspace process simultaneously Ultimately, memory attach/detach is controlled by the kernel. My $0.02 is that an ultimate solution would need to have some kind of kernel aspect to it. In the past, Linus has been (probably rightfully) resistant to adding such solutions for the general case, because these problems are really fairly specific to OS-bypass network stacks (i.e., the drivers/infiniband area in the kernel). His prior response when this topic came up back in 2009 was basically "fix your own network stack." That being said, if someone would like to advance work in this area -- particularly to include a solution in the drivers/infiniband section of the Linux kernel, I think that would be great. > On Jun 16, 2016, at 3:59 PM, Audet, Martin <martin.au...@cnrc-nrc.gc.ca> > wrote: > > Hi, > > After reading a little the FAQ on the methods used by Open MPI to deal with > memory registration (or pinning) with Infiniband adapter, it seems that we > could avoid all the overhead and complexity of memory > registration/deregistration, registration cache access and update, memory > management (ummunotify) in addition to allowing a better overlap of the > communications with the computations (we could let the communication hardware > do its job independently without resorting to > registration/transfer/deregistration pipelines) by simply having all user > process memory registered all the time. > > Of course a configuration like that is not appropriate in a general setting > (ex: a desktop environment) as it would make swapping almost impossible. > > But in the context of an HPC node where the processes are not supposed to > swap and the OS not overcommit memory, not being able to swap doesn’t appear > to be a problem. > > Moreover since the maximal total memory used per process is often predefined > at the application start as a resource specified to the queuing system, the > OS could easily keep a defined amount of extra memory for its own need > instead of swapping out user process memory. > > I guess that specialized (non-Linux) compute node OS does this. > > But is it possible and does it make sense with Linux ? > > Thanks, > > Martin Audet > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/06/29470.php -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/