On Tue, Apr 7, 2020 at 2:40 AM Antoine Pitrou wrote:
>
> Le 06/04/2020 à 19:22, Todd Lipcon a écrit :
> >
> > The spec should also probably cover thread-safety: if the consumer gets
> an
> > ArrowArray, is it safe to pass off the children to multiple threads and
On Mon, Apr 6, 2020 at 9:57 AM Antoine Pitrou wrote:
>
> Hello Todd,
>
> Le 06/04/2020 à 18:18, Todd Lipcon a écrit :
> >
> > I had a couple questions / items that should be clarified in the spec.
> Wes
> > suggested I raise them here on dev@:
> >
> >
deallocated child
as well.
In other words, I think the spec should be explicit that either:
(a) every allocated structure should "stand alone" and be individually
releasable (and thus moveable)
(b) a produced struct must have the same lifetime as all children.
Consumers should not rel
nar format.
> >
> > Some obvious topics that come to mind:
> >
> > - How do we identify a dataset?
> >
> > - How do we specify projections?
> >
> > - What about predicate push downs or in general parameters?
> >
> > - What underlying protocol to use? HTTP2?
> >
> > - push vs pull?
> >
> > - build a reference implementation (Suggestions?)
> >
> > Potential candidates for using this:
> >
> > - to consume data or to expose result sets: Drill, Hive, Presto, Impala,
> > Spark, RecordService...
> > - as a server: Kudu, HBase, Cassandra, …
> >
> > --
> > Julien
>
--
Todd Lipcon
Software Engineer, Cloudera
PMC are also on the Arrow PMC, I trust we would be able to
> >> collaborate to each other's mutual benefit and success.
> >>
> >> Note that Arrow does not throw C++ exceptions and similarly follows
> >> Google C++ style guide to the same extent at Kudu
; [2]
> https://github.com/cloudera/kudu/blob/7f3691a826b9d27199319409f8d721ec1d08a8ba/src/kudu/consensus/log_reader.cc#L74
> [3]
> https://github.com/cloudera/Impala/blob/a36dcfc0322e213c06d6cf8d3f330c4b06739523/be/src/common/object-pool.h
>
--
Todd Lipcon
Software Engineer, Cloudera
How do you
keep it up to date with the ASF repo, if you have patches entering the ASF
repo from some mechanism other than gerrit?
It might be possible involving some cron job which force pushes from ASF ->
Gerrit, but I haven't ever tried a workflow like that.
-Todd
>
> On Mon
hopeful that ASF Infra will set up an
> > ASF-managed Gerrit.
> >
> > On Sunday, April 24, 2016, Ted Dunning wrote:
> >
> >> Just for the record, Apex had some issues getting Gerrit reviews
> reflected
> >> in a coherent fashion into the Apache record. I p
on Apache projects that
> > Cloudera's involved with, my bias would be to try to get an instance set
> up
> > so that larger patches can be reviewed in a more detailed and
> transactional
> > way. For example: we could use gerrit.cloudera.org (like Kudu and
> &g
s been allocated in one JVM from another JVM but I'm assuming
> this
> > > must be the case in order to claim that the memory is being accessed
> > > directly without being copied, correct?
> > > >
> > > > The implication here is huge. If the memory is being directly shared
> > > across processes by them being allowed to directly reach into the
> direct
> > > byte buffers, that's true shared memory. Otherwise, if there's copies
> > going
> > > on, it's less appealing.
> > > >
> > > >
> > > > Thanks.
> > > >
> > > > Sent from my iPad
> > >
> >
>
--
Todd Lipcon
Software Engineer, Cloudera
> > > > > >
> > > > wrote:
> > > > > >
> > > > > > I added a bunch of company domains but if you aren't in one of
> > those
> > > > and
> > > > > > want an invite, just let me know and I'll add you.
> > > > > >
> > > > > > http://apachearrow.slack.com
> > > > > >
> > > > > > thanks,
> > > > > > Jacques
> > > > >
> > > >
> > >
> >
>
--
Todd Lipcon
Software Engineer, Cloudera
h.cc:125] User CPU per req: 16.916us
I0225 12:12:08.943967 14968 rpc-bench.cc:126] Sys CPU per req: 28.8567us
so that implies about 35us round trip latency. It could probably be
improved a bit with some effort, but this has never been the majority
of our request processing time. Other systems might be a bit
different, but I'd expect <100us of even a "slow" RPC system.
--
Todd Lipcon
Software Engineer, Cloudera
dgeinsights.com> wrote:
>> >> > > >
>> >> > > > > Hmm...that's not exactly how Jaques described things to me when
>> he
>> >> > > > briefed
>> >> > > > > me on Arrow ahead of the announcement.
>> >> > > > >
>> >> > > > > -Original Message-
>> >> > > > > From: Zhe Zhang [mailto:z...@apache.org]
>> >> > > > > Sent: Wednesday, February 24, 2016 2:08 PM
>> >> > > > > To: dev@arrow.apache.org
>> >> > > > > Subject: Re: Question about mutability
>> >> > > > >
>> >> > > > > I don't think one application/process's memory space will be
>> made
>> >> > > > > available to other applications/processes. It's fundamentally
>> hard
>> >> > for
>> >> > > > > processes to share their address spaces.
>> >> > > > >
>> >> > > > > IIUC, with Arrow, when application A shares data with
>> application
>> >> B,
>> >> > > the
>> >> > > > > data is still duplicated in the memory spaces of A and B. It's
>> just
>> >> > > that
>> >> > > > > data serialization/deserialization are much faster with Arrow
>> >> > (compared
>> >> > > > > with Protobuf).
>> >> > > > >
>> >> > > > > On Wed, Feb 24, 2016 at 10:40 AM Corey Nolet > >
>> >> > > wrote:
>> >> > > > >
>> >> > > > > > Forgive me if this question seems ill-informed. I just started
>> >> > > looking
>> >> > > > > > at Arrow yesterday. I looked around the github a tad.
>> >> > > > > >
>> >> > > > > > Are you expecting the memory space held by one application to
>> be
>> >> > > > > > mutable by that application and made available to all
>> >> applications
>> >> > > > > > trying to read the memory space?
>> >> > > > > >
>> >> > > > >
>> >> > > >
>> >> > >
>> >> >
>> >> --
>> >> --
>> >> Cheers,
>> >> Leif
>> >>
>>
>>
>>
>> --
>> Todd Lipcon
>> Software Engineer, Cloudera
>>
>
>
>
> --
> Henry Robinson
> Software Engineer
> Cloudera
> 415-994-6679
--
Todd Lipcon
Software Engineer, Cloudera
v@arrow.apache.org
>> > > > > Subject: Re: Question about mutability
>> > > > >
>> > > > > I don't think one application/process's memory space will be made
>> > > > > available to other applications/processes. It's fundamentally hard
>> > for
>> > > > > processes to share their address spaces.
>> > > > >
>> > > > > IIUC, with Arrow, when application A shares data with application
>> B,
>> > > the
>> > > > > data is still duplicated in the memory spaces of A and B. It's just
>> > > that
>> > > > > data serialization/deserialization are much faster with Arrow
>> > (compared
>> > > > > with Protobuf).
>> > > > >
>> > > > > On Wed, Feb 24, 2016 at 10:40 AM Corey Nolet
>> > > wrote:
>> > > > >
>> > > > > > Forgive me if this question seems ill-informed. I just started
>> > > looking
>> > > > > > at Arrow yesterday. I looked around the github a tad.
>> > > > > >
>> > > > > > Are you expecting the memory space held by one application to be
>> > > > > > mutable by that application and made available to all
>> applications
>> > > > > > trying to read the memory space?
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> --
>> --
>> Cheers,
>> Leif
>>
--
Todd Lipcon
Software Engineer, Cloudera
e to my mind is how does Arrow
>>> compare with parquet.
>>>
>>> In my understanding Parquet also supports a very efficient columnar
>>> format (with support for nested structure). It is already embraced
>>> (supported) by various technologies like Impala (origin), Spark, Drill etc.
>>>
>>> The only think I see missing in Parquet is support for SIMD based
>>> vectorized operations.
>>>
>>> Am I right or am I missing many other differences between Arrow and
>>> parquet ?
>>>
>>> Regards,
>>> Sourav
--
Todd Lipcon
Software Engineer, Cloudera
15 matches
Mail list logo