On Thu, Feb 25, 2016 at 11:48 AM, Henry Robinson <he...@cloudera.com> wrote: > It seems like Arrow would benefit from a complementary effort to define a > (simple) streaming memory transfer protocol between processes. Although Wes > mentioned RPC earlier, I'd hope that's really a placeholder for "fast > inter-process message passing", since the overheads of RPC as usually > realised would likely be very significant, even just for control messages. >
I'm a little skeptical of this as being completely generalizable, at least in the context of storage systems. For example, in Kudu, our scanner responses aren't just tabular data, but also contain various bits of control info that are very Kudu-specific. Our scanner requests are super Kudu-specific, with stuff like predicate pushdown, tablet IDs, etc. For the use case of external UDFs or UDTFs, I think there's good opportunity to standardize, though - that interface is a much more "pure functional interface" where we can reasonably expect everyone to conform to the same protocol. Regarding RPC overhead, so long as the batches are reasonable size, I think it could be well amortized. Kudu's RPC system can do 300k RPCs/sec or more on localhost, so if each one of those RPCs corresponds to an 8MB data buffer, that's 2.4TB/sec of bandwidth (about 100x more than RAM bandwidth). Or put another way, with 8MB batches, assuming 24GB/sec RAM bandwidth, you only need 3k round trips/second to saturate the bus. Even a super slow RPC control plane should be able to handle that in its sleep. > This may just mean a common way to access a ring buffer with a defined > in-memory layout and a way of signalling "buffer no longer empty". Without > doing this in a standardised way, we'd run the risk of N^2 communication > protocols between N different analytics systems. > > Such a thing could also potentially be used for RDMA-based signalling that > avoids the CPU, as per this paper from MSR: > https://www.usenix.org/conference/nsdi14/technical-sessions/dragojevic Agreed that RDMA is a very interesting thing to consider especially as next gen networking systems start to become more commo. I'll check out that paper. -Todd > > > >> >> Another difficulty is security -- with a format like Arrow that embeds >> internal pointers, any process following the pointers needs to either >> trust the other processes with write access to the segment, or needs >> to carefully verify any pointers before following them. Otherwise, >> it's pretty easy to crash the reader if you're a malicious writer. In >> the case of using arrow from a shared daemon (eg impalad) to access an >> untrusted UDF (eg python running in a setuid task container), this is >> a real concern IMO. >> >> -Todd >> >> >> >> On Thu, Feb 25, 2016 at 8:37 AM, Wes McKinney <w...@cloudera.com> wrote: >> > hi Leif -- you've articulated almost exactly my vision for pandas >> > interoperability with Spark via Arrow. There are some questions to sort >> > out, like shared memory / mmap management and security / sandboxing >> > questions, but in general moving toward a model where RPC's contain >> shared >> > memory offsets to Arrow data rather than shipping large chunks of >> > serialized data through stdin/stdout would be a huge leap forward. >> > >> > On Wed, Feb 24, 2016 at 4:26 PM, Leif Walsh <leif.wa...@gmail.com> >> wrote: >> > >> >> Here's the image that popped into my mind when I heard about this >> project, >> >> at best it's a motivating example, at worst it's a distraction: >> >> >> >> 1. Spark reads parquet from wherever into an arrow structure in shared >> >> memory. >> >> 2. Spark executor calls into the Python half of pyspark with a handle to >> >> this memory. >> >> 3. Pandas computes some summary over this data in place, produces some >> >> smaller result set, also in memory, also in arrow format. >> >> 4. Spark driver collects results (without ser/de overheads, just >> straight >> >> byte buffer reads across the network), concatenates them in memory. >> >> 5. Spark driver hands that memory to local pandas, which does more >> >> computation over that in place. >> >> >> >> This is what got me excited, it kills a lot of the spark overheads >> related >> >> to data (and for a kicker, also enables full pandas compatibility along >> >> with statsmodels/scikit/etc in step 3). I suppose if this is not a >> vision >> >> the arrow secs have, I'd be interested in how. Not looking for any >> promises >> >> yet though. >> >> On Wed, Feb 24, 2016 at 14:52 Corey Nolet <cjno...@apache.org> wrote: >> >> >> >> > So far, how does the integration with the Spark project look? Do you >> >> > envision cached Spark partitions allocating Arrows? I could imagine >> this >> >> > would be absolutely huge for being able to ask questions of real-time >> >> data >> >> > sets across applications. >> >> > >> >> > On Wed, Feb 24, 2016 at 2:49 PM, Zhe Zhang <z...@apache.org> wrote: >> >> > >> >> > > Thanks for the insights Jacques. Interesting to learn the thoughts >> on >> >> > > zero-copy sharing. >> >> > > >> >> > > mmap allows sharing address spaces via filesystem interface. That >> has >> >> > some >> >> > > security concerns but with the immutability restrictions (as you >> >> > clarified >> >> > > here), it sounds feasible. What other options do you have in mind? >> >> > > >> >> > > On Wed, Feb 24, 2016 at 11:30 AM Corey Nolet <cjno...@gmail.com> >> >> wrote: >> >> > > >> >> > > > Agreed, >> >> > > > >> >> > > > I thought the whole purpose was to share the memory space (using >> >> > possibly >> >> > > > unsafe operations like ByteBuffers) so that it could be directly >> >> shared >> >> > > > without copy. My interest in this is to have it enable fully >> >> in-memory >> >> > > > computation. Not just "processing" as in Spark, but as a fully >> >> > in-memory >> >> > > > datastore that one application can expose and share with others >> (e.g. >> >> > In >> >> > > > memory structure is constructed from a series of parquet files, >> >> > somehow, >> >> > > > then Spark pulls it in, does some computations, exposes a data >> set, >> >> > > > etc...). >> >> > > > >> >> > > > If you are leaving the allocation of the memory to the >> applications >> >> and >> >> > > > underneath the memory is being allocated using direct >> bytebuffers, I >> >> > > can't >> >> > > > see exactly why the problem is fundamentally hard- especially if >> the >> >> > > > applications themselves are worried about exposing their own >> memory >> >> > > spaces. >> >> > > > >> >> > > > >> >> > > > On Wed, Feb 24, 2016 at 2:17 PM, Andrew Brust < >> >> > > > andrew.br...@bluebadgeinsights.com> wrote: >> >> > > > >> >> > > > > Hmm...that's not exactly how Jaques described things to me when >> he >> >> > > > briefed >> >> > > > > me on Arrow ahead of the announcement. >> >> > > > > >> >> > > > > -----Original Message----- >> >> > > > > From: Zhe Zhang [mailto:z...@apache.org] >> >> > > > > Sent: Wednesday, February 24, 2016 2:08 PM >> >> > > > > To: dev@arrow.apache.org >> >> > > > > Subject: Re: Question about mutability >> >> > > > > >> >> > > > > I don't think one application/process's memory space will be >> made >> >> > > > > available to other applications/processes. It's fundamentally >> hard >> >> > for >> >> > > > > processes to share their address spaces. >> >> > > > > >> >> > > > > IIUC, with Arrow, when application A shares data with >> application >> >> B, >> >> > > the >> >> > > > > data is still duplicated in the memory spaces of A and B. It's >> just >> >> > > that >> >> > > > > data serialization/deserialization are much faster with Arrow >> >> > (compared >> >> > > > > with Protobuf). >> >> > > > > >> >> > > > > On Wed, Feb 24, 2016 at 10:40 AM Corey Nolet <cjno...@gmail.com >> > >> >> > > wrote: >> >> > > > > >> >> > > > > > Forgive me if this question seems ill-informed. I just started >> >> > > looking >> >> > > > > > at Arrow yesterday. I looked around the github a tad. >> >> > > > > > >> >> > > > > > Are you expecting the memory space held by one application to >> be >> >> > > > > > mutable by that application and made available to all >> >> applications >> >> > > > > > trying to read the memory space? >> >> > > > > > >> >> > > > > >> >> > > > >> >> > > >> >> > >> >> -- >> >> -- >> >> Cheers, >> >> Leif >> >> >> >> >> >> -- >> Todd Lipcon >> Software Engineer, Cloudera >> > > > > -- > Henry Robinson > Software Engineer > Cloudera > 415-994-6679 -- Todd Lipcon Software Engineer, Cloudera