So far, how does the integration with the Spark project look? Do you envision cached Spark partitions allocating Arrows? I could imagine this would be absolutely huge for being able to ask questions of real-time data sets across applications.
On Wed, Feb 24, 2016 at 2:49 PM, Zhe Zhang <z...@apache.org> wrote: > Thanks for the insights Jacques. Interesting to learn the thoughts on > zero-copy sharing. > > mmap allows sharing address spaces via filesystem interface. That has some > security concerns but with the immutability restrictions (as you clarified > here), it sounds feasible. What other options do you have in mind? > > On Wed, Feb 24, 2016 at 11:30 AM Corey Nolet <cjno...@gmail.com> wrote: > > > Agreed, > > > > I thought the whole purpose was to share the memory space (using possibly > > unsafe operations like ByteBuffers) so that it could be directly shared > > without copy. My interest in this is to have it enable fully in-memory > > computation. Not just "processing" as in Spark, but as a fully in-memory > > datastore that one application can expose and share with others (e.g. In > > memory structure is constructed from a series of parquet files, somehow, > > then Spark pulls it in, does some computations, exposes a data set, > > etc...). > > > > If you are leaving the allocation of the memory to the applications and > > underneath the memory is being allocated using direct bytebuffers, I > can't > > see exactly why the problem is fundamentally hard- especially if the > > applications themselves are worried about exposing their own memory > spaces. > > > > > > On Wed, Feb 24, 2016 at 2:17 PM, Andrew Brust < > > andrew.br...@bluebadgeinsights.com> wrote: > > > > > Hmm...that's not exactly how Jaques described things to me when he > > briefed > > > me on Arrow ahead of the announcement. > > > > > > -----Original Message----- > > > From: Zhe Zhang [mailto:z...@apache.org] > > > Sent: Wednesday, February 24, 2016 2:08 PM > > > To: dev@arrow.apache.org > > > Subject: Re: Question about mutability > > > > > > I don't think one application/process's memory space will be made > > > available to other applications/processes. It's fundamentally hard for > > > processes to share their address spaces. > > > > > > IIUC, with Arrow, when application A shares data with application B, > the > > > data is still duplicated in the memory spaces of A and B. It's just > that > > > data serialization/deserialization are much faster with Arrow (compared > > > with Protobuf). > > > > > > On Wed, Feb 24, 2016 at 10:40 AM Corey Nolet <cjno...@gmail.com> > wrote: > > > > > > > Forgive me if this question seems ill-informed. I just started > looking > > > > at Arrow yesterday. I looked around the github a tad. > > > > > > > > Are you expecting the memory space held by one application to be > > > > mutable by that application and made available to all applications > > > > trying to read the memory space? > > > > > > > > > >