Re: Understanding "shared" memory implications

2016-03-19 Thread Wes McKinney
> From: Wes McKinney [mailto:w...@cloudera.com] > Sent: Thursday, March 17, 2016 6:03 AM > To: dev@arrow.apache.org > Subject: Re: Understanding "shared" memory implications > > On Wed, Mar 16, 2016 at 2:33 PM, Jacques Nadeau wrote: >> >> For Arrow, let's

RE: Understanding "shared" memory implications

2016-03-19 Thread Zheng, Kai
riday, March 18, 2016 6:51 AM To: dev@arrow.apache.org Subject: Re: Understanding "shared" memory implications hi Kai, This sounds like it might merit a separate thread to discuss the growth of Arrow as a modular ecosystem of libraries in different programming languages and relat

Re: Understanding "shared" memory implications

2016-03-19 Thread Corey Nolet
I've been under the impression that exposing memory to be shared directly and not copied WAS, in fact, the responsibility of Arrow. In fact, I read this in [1] and this is turned me on to Arrow in the first place. [1] http://www.datanami.com/2016/02/17/arrow-aims-to-defrag-big-in-memory-analytics

Re: Understanding "shared" memory implications

2016-03-19 Thread Wes McKinney
On Wed, Mar 16, 2016 at 2:33 PM, Jacques Nadeau wrote: > > For Arrow, let's make sure that we do our best to accomplish both (1) and > (2). They seem like entirely compatible goals. > > For my part on the C++ side, I plan to proceed with a hub-and-spoke model. A minimal small core library with "l

Re: Understanding "shared" memory implications

2016-03-19 Thread Wes McKinney
It has always been the expectation that no system would be required to use a particular piece of Arrow software to "use Arrow" (hence the importance of having a well-defined specification for memory and metadata). However, we should also not expect all systems to create their own implementations of

Re: Understanding "shared" memory implications

2016-03-19 Thread Leif Walsh
Seems to me IPC/LPC/RPC focuses on the wrong distinction. I think the right one is between async message-passing (over a socket), where the receiver decides when to handle the message, and synchronous/direct memory manipulation (shared mmap, rdma), where the "client" manipulates the "server's" (rat

Re: Understanding "shared" memory implications

2016-03-19 Thread Jacques Nadeau
I think it is okay for a project to be different things to different people. I think it is really important as a library that we have enough supporting examples that people can get started quickly. In some sense I'm modeling this after what Julian did with Calcite. For example he provides a defau

Re: Understanding "shared" memory implications

2016-03-19 Thread Jacques Nadeau
>>You’re hardly the biggest fan of the bundled default execution implementation. At your bidding, we’ve been trying for almost 2 years to get that stuff out of core. Great point. As you stated, I think there are at least two lessons with Calcite: 1. Make sure to have an easy to use out of the box

RE: Understanding "shared" memory implications

2016-03-19 Thread Zheng, Kai
16 6:03 AM To: dev@arrow.apache.org Subject: Re: Understanding "shared" memory implications On Wed, Mar 16, 2016 at 2:33 PM, Jacques Nadeau wrote: > > For Arrow, let's make sure that we do our best to accomplish both (1) > and (2). They seem like entirely compatible goals

Re: Understanding "shared" memory implications

2016-03-19 Thread Julian Hyde
Calcite is a salutary example if what happens if you *don’t* figure out early enough what is core and what is not. You’re hardly the biggest fan of the bundled default execution implementation. At your bidding, we’ve been trying for almost 2 years to get that stuff out of core. Arrow is, at its

Re: Understanding "shared" memory implications

2016-03-19 Thread Zhe Zhang
I have similar concerns as Todd stated below. With an mmap-based approach, we are treating shared memory objects like files. This brings in all filesystem related considerations like ACL and lifecycle mgmt. Stepping back a little, the shared-memory work isn't really specific to Arrow. A few questi

Re: Understanding "shared" memory implications

2016-03-19 Thread Reynold Xin
I always thought Arrow was just an in-memory format, and it is the responsibility of whoever else that want to use it to carry that responsibilities out, because depending on workloads, different frameworks might pick very different applications. Otherwise it seems to be doing too much and having t

Re: Understanding "shared" memory implications

2016-03-19 Thread Julian Hyde
This is all very interesting stuff, but just so we’re clear: it is not Arrow’s responsibility to provide an RPC/IPC/LPC mechanism, nor facilities for resource management. If we DID decide to make this Arrow’s responsibility it would overlap with other components which specialize in such stuff.

Re: Understanding "shared" memory implications

2016-03-19 Thread Jacques Nadeau
@Todd: agree entirely on prototyping design. My goal is throw out some ideas and some POC code and then we can explore from there. My main thoughts have initially been around lifecycle management. I've done some work previously where a consistently sized shared buffer using mmap has improved perfo

Re: Understanding "shared" memory implications

2016-03-18 Thread Ted Dunning
On Tue, Mar 15, 2016 at 5:54 PM, Jacques Nadeau wrote: > How do others feel of my redefinition of IPC to mean the same memory space > communication (either via shared memory or rdma) versus RPC as socket based > communication? > IPC already has a strong definition which is close to what you wan

Re: Understanding "shared" memory implications

2016-03-15 Thread Raymond Tay
approaches… Thnx Raymond Tay -Original Message- From: Jacques Nadeau Reply-To: "dev@arrow.apache.org" Date: Wednesday, 16 March 2016 at 8:54 AM To: "dev@arrow.apache.org" Subject: Re: Understanding "shared" memory implications >@Corey >The POC Steve

Re: Understanding "shared" memory implications

2016-03-15 Thread Todd Lipcon
Having thought about this quite a bit in the past, I think the mechanics of how to share memory are by far the easiest part. The much harder part is the resource management and ownership. Questions like: - if you are using an mmapped file in /dev/shm/, how do you make sure it gets cleaned up if th

Re: Understanding "shared" memory implications

2016-03-15 Thread Jacques Nadeau
@Corey The POC Steven and Wes are working on is based on MappedBuffer but I'm looking at using netty's fork of tcnative to use shared memory directly. @Yiannis We need to have both RPC and a shared memory mechanisms (what I'm inclined to call IPC but is a specific kind of IPC). The idea is we nego

Re: Understanding "shared" memory implications

2016-03-15 Thread Corey Nolet
I was seeing Netty's unsafe classes being used here, not mapped byte buffer not sure if that statement is completely correct but I'll have to dog through the code again to figure that out. The more I was looking at unsafe, it makes sense why that would be used.apparently it's also supposed to be

Re: Understanding "shared" memory implications

2016-03-15 Thread Yiannis Gkoufas
Hi Wes, can you please clarify something I don't understand? The next versions of arrow will include the shared memory control flow as well? So then, what is needed for HBase (for instance) to be integrated is the adapter to the arrow format? If yes, then who will be responsible for keeping the da

Re: Understanding "shared" memory implications

2016-03-15 Thread Wes McKinney
My understanding is that you can use java.nio.MappedByteBuffer to work with memory-mapped files as one way to share memory pages between Java (and non-Java) processes without copying. I am hoping that we can reach a POC of zero-copy Arrow memory sharing Java-to-Java and Java-to-C++ in the near fut