Re: C interface clarifications

2020-04-07 Thread Todd Lipcon
On Tue, Apr 7, 2020 at 2:40 AM Antoine Pitrou wrote: > > Le 06/04/2020 à 19:22, Todd Lipcon a écrit : > > > > The spec should also probably cover thread-safety: if the consumer gets > an > > ArrowArray, is it safe to pass off the children to multiple threads and

Re: C interface clarifications

2020-04-06 Thread Todd Lipcon
On Mon, Apr 6, 2020 at 9:57 AM Antoine Pitrou wrote: > > Hello Todd, > > Le 06/04/2020 à 18:18, Todd Lipcon a écrit : > > > > I had a couple questions / items that should be clarified in the spec. > Wes > > suggested I raise them here on dev@: > > > >

C interface clarifications

2020-04-06 Thread Todd Lipcon
deallocated child as well. In other words, I think the spec should be explicit that either: (a) every allocated structure should "stand alone" and be individually releasable (and thus moveable) (b) a produced struct must have the same lifetime as all children. Consumers should not rel

Re: Arrow based data access

2017-03-19 Thread Todd Lipcon
nar format. > > > > Some obvious topics that come to mind: > > > > - How do we identify a dataset? > > > > - How do we specify projections? > > > > - What about predicate push downs or in general parameters? > > > > - What underlying protocol to use? HTTP2? > > > > - push vs pull? > > > > - build a reference implementation (Suggestions?) > > > > Potential candidates for using this: > > > > - to consume data or to expose result sets: Drill, Hive, Presto, Impala, > > Spark, RecordService... > > - as a server: Kudu, HBase, Cassandra, … > > > > -- > > Julien > -- Todd Lipcon Software Engineer, Cloudera

Re: [DISCUSS] C++ code sharing amongst Apache {Arrow, Kudu, Impala, Parquet}

2017-02-26 Thread Todd Lipcon
PMC are also on the Arrow PMC, I trust we would be able to > >> collaborate to each other's mutual benefit and success. > >> > >> Note that Arrow does not throw C++ exceptions and similarly follows > >> Google C++ style guide to the same extent at Kudu

Re: [C++] How careful do we want to be about exceptions?

2016-06-07 Thread Todd Lipcon
; [2] > https://github.com/cloudera/kudu/blob/7f3691a826b9d27199319409f8d721ec1d08a8ba/src/kudu/consensus/log_reader.cc#L74 > [3] > https://github.com/cloudera/Impala/blob/a36dcfc0322e213c06d6cf8d3f330c4b06739523/be/src/common/object-pool.h > -- Todd Lipcon Software Engineer, Cloudera

Re: Code review tools for Arrow patches

2016-04-25 Thread Todd Lipcon
How do you keep it up to date with the ASF repo, if you have patches entering the ASF repo from some mechanism other than gerrit? It might be possible involving some cron job which force pushes from ASF -> Gerrit, but I haven't ever tried a workflow like that. -Todd > > On Mon

Re: Code review tools for Arrow patches

2016-04-25 Thread Todd Lipcon
hopeful that ASF Infra will set up an > > ASF-managed Gerrit. > > > > On Sunday, April 24, 2016, Ted Dunning wrote: > > > >> Just for the record, Apex had some issues getting Gerrit reviews > reflected > >> in a coherent fashion into the Apache record. I p

Re: Code review tools for Arrow patches

2016-04-15 Thread Todd Lipcon
on Apache projects that > > Cloudera's involved with, my bias would be to try to get an instance set > up > > so that larger patches can be reviewed in a more detailed and > transactional > > way. For example: we could use gerrit.cloudera.org (like Kudu and > &g

Re: Understanding "shared" memory implications

2016-03-15 Thread Todd Lipcon
s been allocated in one JVM from another JVM but I'm assuming > this > > > must be the case in order to claim that the memory is being accessed > > > directly without being copied, correct? > > > > > > > > The implication here is huge. If the memory is being directly shared > > > across processes by them being allowed to directly reach into the > direct > > > byte buffers, that's true shared memory. Otherwise, if there's copies > > going > > > on, it's less appealing. > > > > > > > > > > > > Thanks. > > > > > > > > Sent from my iPad > > > > > > -- Todd Lipcon Software Engineer, Cloudera

Re: I setup a slack team to have a live channel to discuss Arrow

2016-03-11 Thread Todd Lipcon
> > > > > > > > > > wrote: > > > > > > > > > > > > I added a bunch of company domains but if you aren't in one of > > those > > > > and > > > > > > want an invite, just let me know and I'll add you. > > > > > > > > > > > > http://apachearrow.slack.com > > > > > > > > > > > > thanks, > > > > > > Jacques > > > > > > > > > > > > > > > -- Todd Lipcon Software Engineer, Cloudera

Re: Question about mutability

2016-02-25 Thread Todd Lipcon
h.cc:125] User CPU per req: 16.916us I0225 12:12:08.943967 14968 rpc-bench.cc:126] Sys CPU per req: 28.8567us so that implies about 35us round trip latency. It could probably be improved a bit with some effort, but this has never been the majority of our request processing time. Other systems might be a bit different, but I'd expect <100us of even a "slow" RPC system. -- Todd Lipcon Software Engineer, Cloudera

Re: Question about mutability

2016-02-25 Thread Todd Lipcon
dgeinsights.com> wrote: >> >> > > > >> >> > > > > Hmm...that's not exactly how Jaques described things to me when >> he >> >> > > > briefed >> >> > > > > me on Arrow ahead of the announcement. >> >> > > > > >> >> > > > > -Original Message- >> >> > > > > From: Zhe Zhang [mailto:z...@apache.org] >> >> > > > > Sent: Wednesday, February 24, 2016 2:08 PM >> >> > > > > To: dev@arrow.apache.org >> >> > > > > Subject: Re: Question about mutability >> >> > > > > >> >> > > > > I don't think one application/process's memory space will be >> made >> >> > > > > available to other applications/processes. It's fundamentally >> hard >> >> > for >> >> > > > > processes to share their address spaces. >> >> > > > > >> >> > > > > IIUC, with Arrow, when application A shares data with >> application >> >> B, >> >> > > the >> >> > > > > data is still duplicated in the memory spaces of A and B. It's >> just >> >> > > that >> >> > > > > data serialization/deserialization are much faster with Arrow >> >> > (compared >> >> > > > > with Protobuf). >> >> > > > > >> >> > > > > On Wed, Feb 24, 2016 at 10:40 AM Corey Nolet > > >> >> > > wrote: >> >> > > > > >> >> > > > > > Forgive me if this question seems ill-informed. I just started >> >> > > looking >> >> > > > > > at Arrow yesterday. I looked around the github a tad. >> >> > > > > > >> >> > > > > > Are you expecting the memory space held by one application to >> be >> >> > > > > > mutable by that application and made available to all >> >> applications >> >> > > > > > trying to read the memory space? >> >> > > > > > >> >> > > > > >> >> > > > >> >> > > >> >> > >> >> -- >> >> -- >> >> Cheers, >> >> Leif >> >> >> >> >> >> -- >> Todd Lipcon >> Software Engineer, Cloudera >> > > > > -- > Henry Robinson > Software Engineer > Cloudera > 415-994-6679 -- Todd Lipcon Software Engineer, Cloudera

Re: Question about mutability

2016-02-25 Thread Todd Lipcon
v@arrow.apache.org >> > > > > Subject: Re: Question about mutability >> > > > > >> > > > > I don't think one application/process's memory space will be made >> > > > > available to other applications/processes. It's fundamentally hard >> > for >> > > > > processes to share their address spaces. >> > > > > >> > > > > IIUC, with Arrow, when application A shares data with application >> B, >> > > the >> > > > > data is still duplicated in the memory spaces of A and B. It's just >> > > that >> > > > > data serialization/deserialization are much faster with Arrow >> > (compared >> > > > > with Protobuf). >> > > > > >> > > > > On Wed, Feb 24, 2016 at 10:40 AM Corey Nolet >> > > wrote: >> > > > > >> > > > > > Forgive me if this question seems ill-informed. I just started >> > > looking >> > > > > > at Arrow yesterday. I looked around the github a tad. >> > > > > > >> > > > > > Are you expecting the memory space held by one application to be >> > > > > > mutable by that application and made available to all >> applications >> > > > > > trying to read the memory space? >> > > > > > >> > > > > >> > > > >> > > >> > >> -- >> -- >> Cheers, >> Leif >> -- Todd Lipcon Software Engineer, Cloudera

Re: Comparing with Parquet

2016-02-25 Thread Todd Lipcon
e to my mind is how does Arrow >>> compare with parquet. >>> >>> In my understanding Parquet also supports a very efficient columnar >>> format (with support for nested structure). It is already embraced >>> (supported) by various technologies like Impala (origin), Spark, Drill etc. >>> >>> The only think I see missing in Parquet is support for SIMD based >>> vectorized operations. >>> >>> Am I right or am I missing many other differences between Arrow and >>> parquet ? >>> >>> Regards, >>> Sourav -- Todd Lipcon Software Engineer, Cloudera