On 25 February 2016 at 11:35, Todd Lipcon <t...@cloudera.com> wrote: > One thing to keep in mind is that shared memory is not a performance > panacea. > > We did some experimentation (actually, an intern on our team did -- > credit where credit is due) with shared memory transport between the > Kudu C++ client and server. What we found was that, for single-batch > transfers, shared memory was no faster than unix domain sockets (which > were measurably but not a lot faster than TCP sockets). The issue is > that, when a new process mmaps a shared memory segment, it still has > to take minor page faults to set up page table entries on every page > it hits. > > Shared memory with huge pages is one potential solution, but relies on > either a fixed hugepage allocation (pain to configure) or hoping that > background THP compaction is keeping up. Additionally, tmpfs-based > shared memory doesn't support huge pages in current kernel versions > AFAIK, so you're stuck with sysv shared memory, which is a bit more > painful to use from Java. > > If you're able to reuse a shared memory region across multiple data > transfers, the improvements are substantial, though, so it's not a > dead end. It's just most useful for the case where you're streaming a > bunch of *batches* of data (eg 8M each) between two processes, and can > have some way of coordinating between them. >
It seems like Arrow would benefit from a complementary effort to define a (simple) streaming memory transfer protocol between processes. Although Wes mentioned RPC earlier, I'd hope that's really a placeholder for "fast inter-process message passing", since the overheads of RPC as usually realised would likely be very significant, even just for control messages. This may just mean a common way to access a ring buffer with a defined in-memory layout and a way of signalling "buffer no longer empty". Without doing this in a standardised way, we'd run the risk of N^2 communication protocols between N different analytics systems. Such a thing could also potentially be used for RDMA-based signalling that avoids the CPU, as per this paper from MSR: https://www.usenix.org/conference/nsdi14/technical-sessions/dragojevic > > Another difficulty is security -- with a format like Arrow that embeds > internal pointers, any process following the pointers needs to either > trust the other processes with write access to the segment, or needs > to carefully verify any pointers before following them. Otherwise, > it's pretty easy to crash the reader if you're a malicious writer. In > the case of using arrow from a shared daemon (eg impalad) to access an > untrusted UDF (eg python running in a setuid task container), this is > a real concern IMO. > > -Todd > > > > On Thu, Feb 25, 2016 at 8:37 AM, Wes McKinney <w...@cloudera.com> wrote: > > hi Leif -- you've articulated almost exactly my vision for pandas > > interoperability with Spark via Arrow. There are some questions to sort > > out, like shared memory / mmap management and security / sandboxing > > questions, but in general moving toward a model where RPC's contain > shared > > memory offsets to Arrow data rather than shipping large chunks of > > serialized data through stdin/stdout would be a huge leap forward. > > > > On Wed, Feb 24, 2016 at 4:26 PM, Leif Walsh <leif.wa...@gmail.com> > wrote: > > > >> Here's the image that popped into my mind when I heard about this > project, > >> at best it's a motivating example, at worst it's a distraction: > >> > >> 1. Spark reads parquet from wherever into an arrow structure in shared > >> memory. > >> 2. Spark executor calls into the Python half of pyspark with a handle to > >> this memory. > >> 3. Pandas computes some summary over this data in place, produces some > >> smaller result set, also in memory, also in arrow format. > >> 4. Spark driver collects results (without ser/de overheads, just > straight > >> byte buffer reads across the network), concatenates them in memory. > >> 5. Spark driver hands that memory to local pandas, which does more > >> computation over that in place. > >> > >> This is what got me excited, it kills a lot of the spark overheads > related > >> to data (and for a kicker, also enables full pandas compatibility along > >> with statsmodels/scikit/etc in step 3). I suppose if this is not a > vision > >> the arrow secs have, I'd be interested in how. Not looking for any > promises > >> yet though. > >> On Wed, Feb 24, 2016 at 14:52 Corey Nolet <cjno...@apache.org> wrote: > >> > >> > So far, how does the integration with the Spark project look? Do you > >> > envision cached Spark partitions allocating Arrows? I could imagine > this > >> > would be absolutely huge for being able to ask questions of real-time > >> data > >> > sets across applications. > >> > > >> > On Wed, Feb 24, 2016 at 2:49 PM, Zhe Zhang <z...@apache.org> wrote: > >> > > >> > > Thanks for the insights Jacques. Interesting to learn the thoughts > on > >> > > zero-copy sharing. > >> > > > >> > > mmap allows sharing address spaces via filesystem interface. That > has > >> > some > >> > > security concerns but with the immutability restrictions (as you > >> > clarified > >> > > here), it sounds feasible. What other options do you have in mind? > >> > > > >> > > On Wed, Feb 24, 2016 at 11:30 AM Corey Nolet <cjno...@gmail.com> > >> wrote: > >> > > > >> > > > Agreed, > >> > > > > >> > > > I thought the whole purpose was to share the memory space (using > >> > possibly > >> > > > unsafe operations like ByteBuffers) so that it could be directly > >> shared > >> > > > without copy. My interest in this is to have it enable fully > >> in-memory > >> > > > computation. Not just "processing" as in Spark, but as a fully > >> > in-memory > >> > > > datastore that one application can expose and share with others > (e.g. > >> > In > >> > > > memory structure is constructed from a series of parquet files, > >> > somehow, > >> > > > then Spark pulls it in, does some computations, exposes a data > set, > >> > > > etc...). > >> > > > > >> > > > If you are leaving the allocation of the memory to the > applications > >> and > >> > > > underneath the memory is being allocated using direct > bytebuffers, I > >> > > can't > >> > > > see exactly why the problem is fundamentally hard- especially if > the > >> > > > applications themselves are worried about exposing their own > memory > >> > > spaces. > >> > > > > >> > > > > >> > > > On Wed, Feb 24, 2016 at 2:17 PM, Andrew Brust < > >> > > > andrew.br...@bluebadgeinsights.com> wrote: > >> > > > > >> > > > > Hmm...that's not exactly how Jaques described things to me when > he > >> > > > briefed > >> > > > > me on Arrow ahead of the announcement. > >> > > > > > >> > > > > -----Original Message----- > >> > > > > From: Zhe Zhang [mailto:z...@apache.org] > >> > > > > Sent: Wednesday, February 24, 2016 2:08 PM > >> > > > > To: dev@arrow.apache.org > >> > > > > Subject: Re: Question about mutability > >> > > > > > >> > > > > I don't think one application/process's memory space will be > made > >> > > > > available to other applications/processes. It's fundamentally > hard > >> > for > >> > > > > processes to share their address spaces. > >> > > > > > >> > > > > IIUC, with Arrow, when application A shares data with > application > >> B, > >> > > the > >> > > > > data is still duplicated in the memory spaces of A and B. It's > just > >> > > that > >> > > > > data serialization/deserialization are much faster with Arrow > >> > (compared > >> > > > > with Protobuf). > >> > > > > > >> > > > > On Wed, Feb 24, 2016 at 10:40 AM Corey Nolet <cjno...@gmail.com > > > >> > > wrote: > >> > > > > > >> > > > > > Forgive me if this question seems ill-informed. I just started > >> > > looking > >> > > > > > at Arrow yesterday. I looked around the github a tad. > >> > > > > > > >> > > > > > Are you expecting the memory space held by one application to > be > >> > > > > > mutable by that application and made available to all > >> applications > >> > > > > > trying to read the memory space? > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> -- > >> -- > >> Cheers, > >> Leif > >> > > > > -- > Todd Lipcon > Software Engineer, Cloudera > -- Henry Robinson Software Engineer Cloudera 415-994-6679