I have similar concerns as Todd stated below. With an mmap-based approach, we are treating shared memory objects like files. This brings in all filesystem related considerations like ACL and lifecycle mgmt.
Stepping back a little, the shared-memory work isn't really specific to Arrow. A few questions related to this: 1) Has the topic been discussed in the context of protobuf (or other IPC protocols) before? Seems Cap'n Proto (https://capnproto.org/) has zero-copy shared memory. I haven't read implementation detail though. 2) If the shared-memory work benefits a wide range of protocols, should it be a generalized and standalone library? Thanks, Zhe On Tue, Mar 15, 2016 at 8:30 PM Todd Lipcon <t...@cloudera.com> wrote: > Having thought about this quite a bit in the past, I think the mechanics of > how to share memory are by far the easiest part. The much harder part is > the resource management and ownership. Questions like: > > - if you are using an mmapped file in /dev/shm/, how do you make sure it > gets cleaned up if the process crashes? > - how do you allocate memory to it? there's nothing ensuring that /dev/shm > doesn't swap out if you try to put too much in there, and then your > in-memory super-fast access will basically collapse under swap thrashing > - how do you do lifecycle management across the two processes? If, say, > Kudu wants to pass a block of data to some Python program, how does it know > when the Python program is done reading it and it should be deleted? What > if the python program crashed in the middle - when can Kudu release it? > - how do you do security? If both sides of the connection don't trust each > other, and use length prefixes and offsets, you have to be constantly > validating and re-validating everything you read. > > Another big factor is that shared memory is not, in my experience, > immediately faster than just copying data over a unix domain socket. In > particular, the first time you read an mmapped file, you'll end up paying > minor page fault overhead on every page. This can be improved with > HugePages, but huge page mmaps are not supported yet in current Linux (work > going on currently to address this). So you're left with hugetlbfs, which > involves static allocations and much more pain. > > All the above is a long way to say: let's make sure we do the write > prototyping and up-front design before jumping into code. > > -Todd > > > > On Tue, Mar 15, 2016 at 5:54 PM, Jacques Nadeau <jacq...@apache.org> > wrote: > > > @Corey > > The POC Steven and Wes are working on is based on MappedBuffer but I'm > > looking at using netty's fork of tcnative to use shared memory directly. > > > > @Yiannis > > We need to have both RPC and a shared memory mechanisms (what I'm > inclined > > to call IPC but is a specific kind of IPC). The idea is we negotiate via > > RPC and then if we determine shared locality, we work over shared memory > > (preferably for both data and control). So the system interacting with > > HBase in your example would be the one responsible for placing collocated > > execution to take advantage of IPC. > > > > How do others feel of my redefinition of IPC to mean the same memory > space > > communication (either via shared memory or rdma) versus RPC as socket > based > > communication? > > > > > > On Tue, Mar 15, 2016 at 5:38 PM, Corey Nolet <cjno...@gmail.com> wrote: > > > > > I was seeing Netty's unsafe classes being used here, not mapped byte > > > buffer not sure if that statement is completely correct but I'll have > to > > > dog through the code again to figure that out. > > > > > > The more I was looking at unsafe, it makes sense why that would be > > > used.apparently it's also supposed to be included on Java 9 as a first > > > class API > > > On Mar 15, 2016 7:03 PM, "Wes McKinney" <w...@cloudera.com> wrote: > > > > > > > My understanding is that you can use java.nio.MappedByteBuffer to > work > > > > with memory-mapped files as one way to share memory pages between > Java > > > > (and non-Java) processes without copying. > > > > > > > > I am hoping that we can reach a POC of zero-copy Arrow memory sharing > > > > Java-to-Java and Java-to-C++ in the near future. Indeed this will > have > > > > huge implications once we get it working end to end (for example, > > > > receiving memory from a Java process in Python without a heavy ser-de > > > > step -- it's what we've always dreamed of) and with the metadata and > > > > shared memory control flow standardized. > > > > > > > > - Wes > > > > > > > > On Wed, Mar 9, 2016 at 9:25 PM, Corey J Nolet <cjno...@gmail.com> > > wrote: > > > > > If I understand correctly, Arrow is using Netty underneath which is > > > > using Sun's Unsafe API in order to allocate direct byte buffers off > > heap. > > > > It is using Netty to communicate between "client" and "server", > > > information > > > > about memory addresses for data that is being requested. > > > > > > > > > > I've never attempted to use the Unsafe API to access off heap > memory > > > > that has been allocated in one JVM from another JVM but I'm assuming > > this > > > > must be the case in order to claim that the memory is being accessed > > > > directly without being copied, correct? > > > > > > > > > > The implication here is huge. If the memory is being directly > shared > > > > across processes by them being allowed to directly reach into the > > direct > > > > byte buffers, that's true shared memory. Otherwise, if there's copies > > > going > > > > on, it's less appealing. > > > > > > > > > > > > > > > Thanks. > > > > > > > > > > Sent from my iPad > > > > > > > > > > > > > -- > Todd Lipcon > Software Engineer, Cloudera >