Thanks for bringing this up, Wes.
On 25 February 2017 at 14:18, Wes McKinney wrote:
> Dear Apache Kudu and Apache Impala (incubating) communities,
>
> (I'm not sure the best way to have a cross-list discussion, so I
> apologize if this does not work well)
>
> On the recent Apache Parquet sync ca
On 25 February 2016 at 11:57, Todd Lipcon wrote:
> On Thu, Feb 25, 2016 at 11:48 AM, Henry Robinson
> wrote:
> > It seems like Arrow would benefit from a complementary effort to define a
> > (simple) streaming memory transfer protocol between processes. Although
> Wes
>
ilto:z...@apache.org]
> >> > > > > Sent: Wednesday, February 24, 2016 2:08 PM
> >> > > > > To: dev@arrow.apache.org
> >> > > > > Subject: Re: Question about mutability
> >> > > > >
> >> > > > > I don't think one application/process's memory space will be
> made
> >> > > > > available to other applications/processes. It's fundamentally
> hard
> >> > for
> >> > > > > processes to share their address spaces.
> >> > > > >
> >> > > > > IIUC, with Arrow, when application A shares data with
> application
> >> B,
> >> > > the
> >> > > > > data is still duplicated in the memory spaces of A and B. It's
> just
> >> > > that
> >> > > > > data serialization/deserialization are much faster with Arrow
> >> > (compared
> >> > > > > with Protobuf).
> >> > > > >
> >> > > > > On Wed, Feb 24, 2016 at 10:40 AM Corey Nolet >
> >> > > wrote:
> >> > > > >
> >> > > > > > Forgive me if this question seems ill-informed. I just started
> >> > > looking
> >> > > > > > at Arrow yesterday. I looked around the github a tad.
> >> > > > > >
> >> > > > > > Are you expecting the memory space held by one application to
> be
> >> > > > > > mutable by that application and made available to all
> >> applications
> >> > > > > > trying to read the memory space?
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >> --
> >> --
> >> Cheers,
> >> Leif
> >>
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>
--
Henry Robinson
Software Engineer
Cloudera
415-994-6679
Think of Parquet as a format well-suited to writing very large datasets to
disk, whereas Arrow is a format most suited to efficient storage in memory. You
might read Parquet files from disk, and then materialize them in memory in
Arrow's format.
Both formats are designed around the idiosyncras