Re: [VOTE] Substrait for Flight SQL

2022-08-31 Thread Jacques Nadeau
+1 (binding) On Wed, Aug 31, 2022, 5:15 PM Larry White wrote: > +1 (non-binding) > > On Wed, Aug 31, 2022 at 7:55 PM Vinicius Fraga wrote: > > > +1 (non-binding) > > > > On Wed, 31 Aug 2022, 20:51 David Li, wrote: > > > > > Hello, > > > > > > I am proposing to extend the Flight SQL specificati

Re: [VOTE] Substrait for Flight SQL

2022-09-08 Thread Jacques Nadeau
My vote continues to be +1 On Thu, Sep 8, 2022 at 11:44 AM Neal Richardson wrote: > +1 > > Neal > > On Thu, Sep 8, 2022 at 2:15 PM Ashish wrote: > > > +1 (non-binding) > > > > On Thu, Sep 8, 2022 at 9:41 AM Gavin Ray wrote: > > > > > Oh, so that's what "non-binding" means in vote threads > > >

Re: [DISCUSS] Donation of a Spark native engine based on DataFusion & Arrow

2024-01-17 Thread Jacques Nadeau
Hey Chao, it would be great for you to share the code some place with commit history. (PR to the repo that Andy made or something else.) On Mon, Jan 15, 2024 at 7:38 AM Andy Grove wrote: > Hi Chao, > > I have created https://github.com/apache/arrow-datafusion-comet and you > should be able to cr

Re: [DISCUSS] Donation of a Spark native engine based on DataFusion & Arrow

2024-01-17 Thread Jacques Nadeau
e of them contain internal info which we need to > remove upon open sourcing. How about we just add a summary in the PR > itself, and add everyone that has contributed to it as co-author to > the PR? > > Chao > > On Wed, Jan 17, 2024 at 11:09 AM Jacques Nadeau > wrote: > &

Re: [DISCUSS] Donation of a Spark native engine based on DataFusion & Arrow

2024-01-18 Thread Jacques Nadeau
idely used internally > already), we'd be happy to help further improve readability & > maintainability of the codebase and resolving issues raised from the > community. Will this work for you? really appreciate if you understand > our situation. > > Thanks, > Chao >

Re: [DISCUSS] Donation of a Spark native engine based on DataFusion & Arrow

2024-01-24 Thread Jacques Nadeau
our valuable comments. > > Best, > Chao > > On Thu, Jan 18, 2024 at 5:24 PM Jacques Nadeau wrote: > > > > Yes, that was roughly what I was requesting (I was suggesting a single PR > > with many commits that would be merged with the history). > > > > It&

Re: [DISCUSS] Developing an "Arrow Compute IR [Intermediate Representation]" to decouple language front ends from Arrow-native compute engines

2021-08-23 Thread Jacques Nadeau
In a lucky turn of events, Phillip actually turned out to be in my neck of the woods on Friday so we had a chance to sit down and discuss this. To help, I actually shared something I had been working on a few months ago independently (before this discussion started). For reference: Wes PR: https:/

Re: [DISCUSS] Developing an "Arrow Compute IR [Intermediate Representation]" to decouple language front ends from Arrow-native compute engines

2021-09-07 Thread Jacques Nadeau
ed anything close to physical-plan things, > but > > I > > > > > think that's a good follow up PR. Having separate representations > for > > > > > logical/physical plans seems like a waste of effort ultimately. I > > think > >

Re: [DISCUSS][Rust] Biweekly sync call for arrow/datafusion again?

2021-09-20 Thread Jacques Nadeau
+1 on time variation. Please add me to to the invite. Thanks On Sun, Sep 19, 2021 at 9:49 PM Benson Muite wrote: > New to this. A suggestion may be to consider two of the times, eg. 4:00 > UTC and 16:00 UTC perhaps alternating allowing geographic diversity in > joining convenience. > > On 9/20/

Re: [Rust] Heads up: RUSTSEC security advisory against arrow-rs

2021-09-30 Thread Jacques Nadeau
In the past I was dealing with something similar. My experience was when data was accepted at the edge, the cost of validating that the first offset is zero, the last is within the data bounds and that all others are equal or increasing was a reasonable overhead associated with validating offsets f

Re: Re: Re: [DISCUSS][Java] Adding GC-Based reference management strategy for buffers

2021-10-07 Thread Jacques Nadeau
Clearly this patch was driven by an implicit set of needs but it's hard to guess at what they are. As Laurent asked, what is the main goal here? There may be many ways to solve this goal. Some thoughts in general: - The allocator is a finely tuned piece of multithreaded machinery that is used on h

Re: Re: Re: [DISCUSS][Java] Adding GC-Based reference management strategy for buffers

2021-10-22 Thread Jacques Nadeau
ct control. The proposal (actually, its 3rd iteration) is > > described here at https://openjdk.java.net/jeps/393, and has been > available > > as an incubator feature for several JDK releases (Javadoc: > > > https://docs.oracle.com/en/java/javase/17/docs/api/jdk.incubat

Re: Question about Arrow Mutable/Immutable Arrays choice

2021-11-03 Thread Jacques Nadeau
Hey Alessandro, take a look at the top level docs on ValueVector: https://arrow.apache.org/docs/java/reference/org/apache/arrow/vector/ValueVector.html Specifically the following: - values need to be written in order (e.g. index 0, 1, 2, 5) - null vectors start with all values as null befo

[DISCUSS][FLIGHT SQL] Intentions around JDBC and/or ODBC for Flight SQL?

2021-12-09 Thread Jacques Nadeau
Hey all, I was curious if there was anyone planning on implementing JDBC and/or ODBC wrappers on top of the Flight SQL Java [1] and Flight SQL C++ implementations [2] since they seem to be completing soon. It seems like JDBC/ODBC could quickstart integration between Flight SQL and other components

Re: [DISCUSS] Adding new columnar memory layouts to Arrow (in-memory, IPC, C ABI)

2021-12-10 Thread Jacques Nadeau
I'm strongly in support of much of this. Thanks for bringing this up. It is long overdue. On initial read, my thoughts would be: Stongly inclined: - String view - constant view Weakly inclined - All null - rle Somewhat disinclined - Sequence change With dictionary and string view, I feel like

Re: [RESULT][VOTE] Proposed addition to Arrow Flight: Arrow Flight RPC

2021-12-25 Thread Jacques Nadeau
That's great news. Congrats and thanks to the team who worked on it. This is a great addition to Arrow! On Thu, Dec 23, 2021, 11:26 AM David Li wrote: > The integration tests and existing PRs were merged into a separate branch. > We also merged in a few build fixes during final review. Just in t

Re: [DISCUSS] Annual rotation of Arrow PMC chair

2022-01-04 Thread Jacques Nadeau
Hey Wes, thanks for bringing this up. And more importantly, thanks for working as the PMC chair this last year! I think Kouhei would be a great choice for the PMC chair. Jacques On Tue, Jan 4, 2022 at 12:44 AM Wes McKinney wrote: > hello all, > > As we discussed at the end of 2020 [1], we woul

Re: [FlightSQL] Structured/Serialized representation of query (like JSON) rather than SQL string possible?

2022-03-03 Thread Jacques Nadeau
James, I agree that you could use JSON but that feels a bit hacky (mis-use of the paradigm). Instead, I'd really like to do something like David is suggesting: support Substrait as an alternative to a SQL string. Something like this: https://github.com/jacques-n/arrow/commit/e22674fa882e77c2889cf95

Re: [DISCUSS] "Naming" the Arrow C++ execution engine subproject?

2022-04-18 Thread Jacques Nadeau
I'm -0.9 on Arrow Compute engine. It makes it sound like it is THE canonical Arrow one, second classing Datafusion and Gandiva. No strong feelings on other names. Naming in general is an extremely subjective process... On Thu, Mar 31, 2022, 2:33 PM Weston Pace wrote: > I'm +1 for "arrow compu

Re: [C++] output field names in Arrow Substrait

2022-04-23 Thread Jacques Nadeau
In the specification, there are both read and intermediate write rels. No one has implemented the protobuf yet for write. Both carry field names. The names of fields is an internal rel node concern just like condition is for filter. This is because many formats require names. For example, parquet

Re: [Discuss][Format] Add 32-bit and 64-bit Decimals

2022-04-23 Thread Jacques Nadeau
I'm generally -0.01 against narrow decimals. My experience in practice has been that widening happens so quickly that they are little used and add unnecessary complexity. For reference, the original Arrow code actually implemented Decimal9 [1] and Decimal18 [2] but we removed both because of this e

Re: [Rust] Enable GitHub discussions for Rust projects?

2022-05-04 Thread Jacques Nadeau
No vote here but a little feedback. We've generally found Github Discussions somewhat lacking in Substrait. If other people find it good, great. I might be more inclined to just drive people to something like StackOverflow or the mailing list. We were initially quite enthusiastic but the experience

Re: ARROW-11465

2022-05-18 Thread Jacques Nadeau
I second Weston's comments. The idea of separate files is part of the de jure spec but not the de facto one. It's up to the parquet community whether the de facto spec should be "altered" . Afaik, zero oss readers support use of this field. On Wed, May 18, 2022, 8:53 AM Weston Pace wrote: > I

Re: Arrow Flight connector for SQL Server

2020-05-19 Thread Jacques Nadeau
Hey Brendan, Welcome to the community. At Dremio we've exposed flight as an input and output for sql result datasets. I'll have one of our guys share some details. I think a couple questions we've been struggling with include how to standardize additional metadata operations, what should the prepa

Re: [DISCUSS] Removing top-level validity bitmap from Union type

2020-06-24 Thread Jacques Nadeau
Per my comments on the pr, I also think this is preferred. I believe we will avoid the potential for validity inconsistency and simplify construction of union data in most cases. On Wed, Jun 24, 2020, 7:58 AM Wes McKinney wrote: > hi folks, > > As discussed on the recent GitHub PR [1], as a mean

Re: Renaming master branch, removing blacklist/whitelist

2020-06-24 Thread Jacques Nadeau
Hi Suvayu, thanks for sharing your experiences. Clearly we have work to do. Wrt to specific name changes, I agree with Wes. If something is negative to a non-trivial portion of the population, why not use something that avoids that issue where possible. On Fri, Jun 19, 2020, 7:44 PM Suvayu Ali

Re: language independent representation of filter expressions

2020-07-11 Thread Jacques Nadeau
On Mon, Jul 6, 2020 at 2:45 PM Wes McKinney wrote: > I would also be interested in having a reusable serialized format for > filter- and projection-like expressions. I think trying to go so far > as full logical query plans suitable for building a SQL engine is > perhaps a bit too far but we coul

Re: language independent representation of filter expressions

2020-07-11 Thread Jacques Nadeau
For reference, the doc (from eight years ago) I meant to link in my initial message was: https://docs.google.com/document/d/1QTL8warUYS2KjldQrGUse7zp8eA72VKtLOHwfXy6c7I/edit On Sat, Jul 11, 2020, 11:24 AM Wes McKinney wrote: > On Sat, Jul 11, 2020 at 11:55 AM Jacques Nadeau > wrote: >

Re: Writing very large rowgroups to Apache Parquet

2020-07-11 Thread Jacques Nadeau
I'd suggest a new write pattern. Write the columns page at a time to separate files then use a second process to concatenate the columns and append the footer. Odds are you would do better than os swapping and take memory requirements down to page size times field count. In s3 I believe you could

Re: Writing very large rowgroups to Apache Parquet

2020-07-17 Thread Jacques Nadeau
he last are expected to be at least 5mb if I read their docs correctly >> [1]) >> >> [1] https://docs.aws.amazon.com/AmazonS3/latest/dev/qfacts.html >> >> >> On Saturday, July 11, 2020, Jacques Nadeau wrote: >> >> > I'd suggest a new write pattern. Wr

Re: [DISCUSS] Using direct memory size as a limit of populated off-heap buffers in Java

2020-07-23 Thread Jacques Nadeau
I'd like to simplify this discussion and start with clarity of use case. If we're talking about a Java developer using the datasets API in a java application, we should respect the Java direct memory size limits set via -XX:MaxDirectMemorySize. Doing something else would violate the principle of le

Re: language independent representation of filter expressions

2020-07-23 Thread Jacques Nadeau
atbuffer schema enum values). > > > On 2020/07/13 09:21:19, Antoine Pitrou wrote: > > On Sat, 11 Jul 2020 09:55:16 -0700 > > Jacques Nadeau wrote: > > > > > > I'm against extending use of flatbuf within Arrow. The language > support is > > >

Re: [ext] Re: language independent representation of filter expressions

2020-07-24 Thread Jacques Nadeau
lds are > typed when I think fields should just contain a field name. > > -Original Message- > From: Jacques Nadeau > Sent: Thursday, July 23, 2020 10:14 PM > To: dev > Subject: [ext] Re: language independent representation of filter > expressions > > Have you

Re: [Java] Supporting Big Endian

2020-08-14 Thread Jacques Nadeau
Hey Micah, thanks for starting the discussion. I just skimmed that thread and it isn't entirely clear that there was a conclusion that the overhead was worth it. I think everybody agrees that it would be nice to have the code work on both platforms. On the flipside, the code noise for a rare case

Re: [DISCUSS] Adding a pull-style iterator API to the C data interface

2020-08-14 Thread Jacques Nadeau
I think this unlocks a bunch of use cases. I think people are generally using Arrow in simpler, non-streaming ways right now and thus the quiet. Producing an iterator pattern is logical as you move to streams of smaller chunks (common in distributed and multi-tenant systems). On Mon, Aug 10, 2020

Re: [DISSCUSS][JAVA] Avoid set reader/writer indices in FieldVector#getFieldBuffers

2020-08-14 Thread Jacques Nadeau
Per my comments there, the introduction of field buffers was added as part of the fieldvector addition when we have vectors that weren't field level. This meant that getbuffers and getfieldbuffers were at different levels at hierarchy (getbuffers being more general). I believe we no longer have the

Re: [DISCUSS] Support of higher bit-width Decimal type

2020-08-14 Thread Jacques Nadeau
Do we have a good definition of what is necessary to add a new data type? Adding a type but not pulling it through most of the code seems less than ideal since it means one part of Arrow doesn't work with another (providing a less optimal end-user experience). For example, would this work include

Re: [DISCUSS] How to extended time value range for Timestamp type?

2020-08-14 Thread Jacques Nadeau
+1, let's be cautious adding these kinds of things. On Wed, Aug 5, 2020 at 5:49 AM Wes McKinney wrote: > I also am not sure there is a good case for a new built-in type since it > introduces a good deal of complexity, particularly when there is the > extension type option. We’ve been living with

Re: Gandiva and Threads

2020-08-14 Thread Jacques Nadeau
@ravin...@dremio.com @prav...@dremio.com thoughts? On Tue, Jul 28, 2020 at 3:39 PM Wes McKinney wrote: > Perhaps Gandiva does not handle sliced arrays properly? This would be > worth investigating > > On Mon, Jul 27, 2020 at 7:43 PM Matt Youill > wrote: > > > > Managed to track down the issue

Re: [DISCUSS] Big Endian support in Arrow (was: Re: [Java] Supporting Big Endian)

2020-08-30 Thread Jacques Nadeau
mplementation > > working > > > > > fully locally? How many additional PRs will be needed and what do > > > > > they look like (I think there already a few more in the queue)? > > > > > > > > > > * Will it introduce performance regressi

Re: [DISCUSS] Big Endian support in Arrow (was: Re: [Java] Supporting Big Endian)

2020-08-31 Thread Jacques Nadeau
> > What do you mean? The Endianness field (a Big|Little enum) was added 4 > years ago: > https://issues.apache.org/jira/browse/ARROW-245 I didn't realize that was done, my bad. Good example of format rot from my pov.

Re: [DISCUSS] Big Endian support in Arrow (was: Re: [Java] Supporting Big Endian)

2020-08-31 Thread Jacques Nadeau
And yes, for those of you looking closely, I commented on ARROW-245 when it was committed. I just forgot about it. It looks like I had mostly the same concerns then that I do now :) Now I'm just more worried about format sprawl... On Mon, Aug 31, 2020 at 1:30 PM Jacques Nadeau wrote: >

Re: [DISCUSS][Java] Support non-nullable vectors

2020-09-10 Thread Jacques Nadeau
e/arrow/pull/8147 > > On Fri, Mar 13, 2020 at 9:47 PM Fan Liya wrote: > >> Hi Jacques, >> >> Thanks a lot for your valuable comments. >> >> I agree with you that collapsing nullable and non-nullable >> implementations is a good idea, and it does not cont

Re: [DISCUSS] Rotating the PMC Chair

2020-09-29 Thread Jacques Nadeau
I'm super supportive of this, Julian. Thanks for bringing it up. Unlike some leaders, I'm even happy to guarantee a peaceful transition of power! Re now vs Feb 17: I'm totally open to either. In general, I'm a do it now kind of person so if others think a slightly longer tenure sounds good, we co

Re: [VOTE][Format] Allow for 256-bit Decimal's in the Arrow specification

2020-09-29 Thread Jacques Nadeau
+1 On Tue, Sep 29, 2020 at 11:19 AM Wes McKinney wrote: > +1 > > On Tue, Sep 29, 2020 at 4:07 AM Fan Liya wrote: > > > > +1 > > > > Best, > > Liya Fan > > > > On Tue, Sep 29, 2020 at 4:55 PM Antoine Pitrou > wrote: > > > > > > > > +1 (binding) > > > > > > I didn't look at the implementation. >

Re: October board report for Arrow

2020-10-11 Thread Jacques Nadeau
Hey all, with the focus on the PMC chair rotation discussion, we have a pretty thin report this month. I've added a few comments in the doc Wes posted. It would be great if others provided additional modifications: https://docs.google.com/document/d/1ir2PB1Yk3groGqZr14tJZr29KtGshB974rdryO2EOKU/edi

[ANNOUNCE] New Arrow PMC chair: Wes McKinney

2020-10-23 Thread Jacques Nadeau
I am pleased to announce that we have a new PMC chair and VP as per our newly started tradition of rotating the chair once a year. I have resigned and Wes was duly elected by the PMC and approved unanimously by the board. Please join me in congratulating Wes! Jacques

Re: [Governance] [Proposal] Stop force-pushing to PRs after release?

2020-11-25 Thread Jacques Nadeau
I'm catching up here. A couple questions. - I don't think we should require the inclusion of the release commits in the main branch. Having leafs created right before release seems to simplify this and resolve any issues around force PRs, no? Or maybe I'm misunderstanding something? Ma

Re: [Governance] [Proposal] Stop force-pushing to PRs after release?

2020-11-25 Thread Jacques Nadeau
> > I don’t have a problem with releasing out of branches. I think I (or > someone) proposed this in the past and there was not consensus but it seems > like a good time to revisit the issue. > Thanks for the recap. I just couldn't remember where people were at on this. I'm a big +1 for releasing

Re: [C++] Shall we modify the ORC reader?

2021-01-10 Thread Jacques Nadeau
I don't think 1 & 2 make sense. I don't think there are a lot of users reading 2gb strings or lists with 2B objects in them. Saying we just don't support that pattern seems fine for now. I also believe the string and list types have better cross-language support than the large variants. On Sun, Ja

Re: [VOTE] Adopt FORMAT and LIBRARY SemVer-based version schemes for Arrow 1.0.0 and beyond

2019-08-04 Thread Jacques Nadeau
Looks good. +1 from me. Thanks for driving this to conclusion. On Wed, Jul 31, 2019, 12:04 PM Bryan Cutler wrote: > +1 (non-binding) > > On Wed, Jul 31, 2019 at 8:59 AM Uwe L. Korn wrote: > > > +1 from me. > > > > I really like the separate versions > > > > Uwe > > > > On Tue, Jul 30, 2019, at

Re: [Java] Arrow PR queue build up?

2019-08-10 Thread Jacques Nadeau
I think one of the issues here is that there is no upfront discussion about most of the changes that are being proposed. In most cases, a pull request just appears without. This makes the reviews much more intensive and time consuming as frequently there are questions about the validity, nature or

Re: [Format] Semantics for dictionary batches in streams

2019-08-10 Thread Jacques Nadeau
What situation are anticipating where you're going to be restating ids mid stream? On Sat, Aug 10, 2019 at 12:13 AM Micah Kornfield wrote: > The IPC specification [1] defines behavior when isDelta on a > DictionaryBatch [2] is "true". I might have missed it in the > specification, but I couldn'

Re: [Discuss][Java] 64-bit lengths for ValueVectors

2019-08-10 Thread Jacques Nadeau
This is a pretty massive change to the apis. I wonder how nasty it would be to just support both paths. Have you evaluated how complex that would be? On Wed, Aug 7, 2019 at 11:08 PM Micah Kornfield wrote: > After more investigation, it looks like Float8Benchmarks at least on my > machine are wit

Re: [Discuss][Java] 64-bit lengths for ValueVectors

2019-08-10 Thread Jacques Nadeau
ually loop over all elements (it is quite possible I missed something > here), the ones that I did find I was able to mitigate performance > penalties as noted above. Some of the current implementation will get a > lot slower for "large arrays", but we can likely fix those later or

Re: [DISCUSS][JAVA] Make FixedSizedListVector inherit from ListVector

2019-08-11 Thread Jacques Nadeau
We tried to get away from this kind of back and forth with subclassing as much as possible. (call getObject on base class which then calls getIndex on child class which then calls something else on base class). I haven't looked through the code but let's try to avoid having complex call paths for t

Re: [Discuss][Java] 64-bit lengths for ValueVectors

2019-08-11 Thread Jacques Nadeau
ti pattern to scalable > > analytical processing--purely subjective of course). > > > I'm open to other ideas here, as well. I don't think it is out of the > question to leave the Java implementation as 32-bit, but if we do, then I > think we should consider a differ

Re: [Format] Semantics for dictionary batches in streams

2019-08-11 Thread Jacques Nadeau
t, Aug 10, 2019 at 4:20 PM Micah Kornfield wrote: > Reading data from two different parquet files sequentially with different > dictionaries for the same column. This could be handled by re-encoding > data but that seems potentially sub-optimal. > > On Sat, Aug 10, 2019 at 12

Re: [Discuss][Java] 64-bit lengths for ValueVectors

2019-08-22 Thread Jacques Nadeau
ion-centrism >> > > on my part, though. >> > >> > >> > A data point against this view is Spark has done work to eliminate 2GB >> > memory limits on its block sizes [1]. I don't claim to understand the >> > implications of this. Bryan might you

Re: [Discuss][Java] 64-bit lengths for ValueVectors

2019-08-22 Thread Jacques Nadeau
> > Hi Jacques, I hope you had a good rest. I did, thanks! On Fri, Aug 23, 2019 at 9:25 AM Jacques Nadeau wrote: > I don't think we should couple this discussion with the implementation of > large list, etc since I think those two concepts are independent. > > I

Re: [Discuss][Java] 64-bit lengths for ValueVectors

2019-08-23 Thread Jacques Nadeau
l instructions or if in most cases, this >> actually doesn't impact instruction count. > > > Is this something that your team will take on? > Yeah, we need to look at this I think. Do you need a rebased version of the PR or is the existing one sufficient? > Rebase would

Re: [Discuss][Java] 64-bit lengths for ValueVectors

2019-08-23 Thread Jacques Nadeau
//github.com/apache/arrow/blob/95175fe7cb8439eebe6d2f6e0495f551d6864380/java/memory/src/main/java/io/netty/buffer/ArrowBuf.java#L164 > > On Fri, Aug 23, 2019 at 4:55 AM Jacques Nadeau wrote: > >> >> >> On Fri, Aug 23, 2019, 11:49 AM Micah Kornfield >> wrote:

Re: Assigning Issues to New Users

2019-08-23 Thread Jacques Nadeau
Let's add committers as admins on jira. I don't see any downsides to that. On Fri, Aug 23, 2019, 9:42 PM Wes McKinney wrote: > hi Paddy, > > I just added andyscho to the "Contributor" role on JIRA so you can > assign them the issue now. > > You need to be a JIRA administrator on the "Arrow" proj

Re: [Discuss][Java] Refactor code for time related vectors

2019-08-26 Thread Jacques Nadeau
I think you'd have to refractor differently since one of the patterns is get(index, object holder) is consistently matched. That means you should never subclass a field vector I believe. In fact we should probably just formally finalize them. Once you do that, I'm not sure how much duplicated code

Re: [DISCUSS][Java] Should null values in VariableWidthVector/ListVector always takes 0 space?

2019-08-28 Thread Jacques Nadeau
#3 is the correct behavior and how the code was meant to be written. I don't see any problems with that pattern. This allows someone to (if they so decide) to null a value without having to rewrite the data. #3 is also a consistent behavior with all other vectors. Null values can use up space but t

Re: [Discuss][Java] Refactor code for time related vectors

2019-08-28 Thread Jacques Nadeau
lity methods for get/set int/long methods. > 2. Push the get/set int/long methods to BaseFixedWidthVector base class. > > What do you think? > > Best, > Liya Fan > > On Tue, Aug 27, 2019 at 7:16 AM Jacques Nadeau wrote: > > > I think you'd have to refractor di

Re: [Discuss][Java] Refactor code for time related vectors

2019-08-28 Thread Jacques Nadeau
rbtrirary to others. I think a bunch of this was covered extensively on the list and in the reviews during the major refractor done last year. I'll also see if I can dig up some more background. > Thanks, > Micah > > [1] https://github.com/apache/arrow/pull/5213 > > > On We

Re: [DISCUSS] Improving Arrow columnar implementation guidelines for third parties

2019-09-17 Thread Jacques Nadeau
> > Let's take an example: > > * Dremio can execute SQL and uses Arrow as its native runtime format > * Apache Spark can execute SQL and offers UDF support with Arrow > format, i.e. so using Arrow for IO > > Both of these projects can say that they "use Apache Arrow", but the > extent to which Arro

Re: [DISCUSS] C-level in-process array protocol

2019-09-28 Thread Jacques Nadeau
I'm not clear on why we need to introduce something beyond what flatbuffers already provides. Can someone explain that to me? I'm not really a fan of introducing a second representation of the same data (as I understand it). On Thu, Sep 19, 2019 at 1:15 PM Wes McKinney wrote: > This is helpful,

Re: [DISCUSS] C-level in-process array protocol

2019-09-28 Thread Jacques Nadeau
[FieldNode]; buffers: [Buffer]; } On Sat, Sep 28, 2019 at 9:02 PM Jacques Nadeau wrote: > I'm not clear on why we need to introduce something beyond what > flatbuffers already provides. Can someone explain that to me? I'm not > really a fan of introducing a second represen

Re: [DISCUSS] C-level in-process array protocol

2019-09-29 Thread Jacques Nadeau
On Sun, Sep 29, 2019 at 12:59 AM Antoine Pitrou wrote: > > Le 29/09/2019 à 06:10, Jacques Nadeau a écrit : > > * No dependency on Flatbuffers. > > * No buffer reassembly (data is already exposed in logical Arrow format). > > * Zero-copy by design. > > * Ea

Re: [DISCUSS][Java] Reduce the range of synchronized block when releasing an ArrowBuf

2019-09-29 Thread Jacques Nadeau
For others that don't realize, the discussion of this is happening on the pull request here: https://github.com/apache/arrow/pull/5526 On Fri, Sep 27, 2019 at 4:52 AM Fan Liya wrote: > Dear all, > > When releasing an ArrowBuf, we will run the following piece of code: > > private int decrement(i

Re: [DISCUSS] C-level in-process array protocol

2019-10-01 Thread Jacques Nadeau
I disagree with this statement: - the IPC format is meant for serialization while the C data protocol is meants for in-memory communication, so different concerns apply If that is how the a particular implementation presents it, that is a weaknesses of the implementation, not the format. The prim

Re: [DISCUSS] C-level in-process array protocol

2019-10-02 Thread Jacques Nadeau
oPB project which is an > implementation of Protocol Buffers with small code size > > https://github.com/nanopb/nanopb > > Let me know if this makes more sense. > > I think it's important to communicate clearly about this primarily for > the benefit of the outside w

Re: [Proposal]: Expose Flight gRPC for Dremio use case (Java)

2019-10-05 Thread Jacques Nadeau
> > Is it possible for a single gRPC server to expose multiple services > through the same port (it sounds like it is)? It would be a good idea > to do similar refactoring in C++ so that Flight RPC endpoints can be > provided alongside some other non-Flight endpoints in the same gRPC > server > It

Re: [DISCUSS] C-level in-process array protocol

2019-10-08 Thread Jacques Nadeau
party integrators? > > > - Flatbuffers aren't entirely straight-forward and I think if we do > move > > > forward with an API based on Column/Array we should consider > alternatives > > > as long as the necessary parsing code can be done in a small amount of > > code > > > (I'm personally a

Re: [DRAFT] Apache Arrow Board Report - October 2019

2019-10-09 Thread Jacques Nadeau
I think we need to more direct in listing issues for the board. What have we done? What do we want them to do? In general, any large org is going to be slow to add new deep integrations into GitHub. I don't think we should expect Apache to be any different (it took several years before we could m

Re: [DRAFT] Apache Arrow Board Report - October 2019

2019-10-09 Thread Jacques Nadeau
sues. All that is needed is for INFRA to let us > use third party GitHub Apps and monitor any potentially destructive > actions that they may take, such as modifying unrelated repository > webhooks related to IP provenance. > > - Wes > > On Wed, Oct 9, 2019 at 9:33 PM Jacques Nadeau

Re: [DRAFT] Apache Arrow Board Report - October 2019

2019-10-10 Thread Jacques Nadeau
rently CI capacity has been a "hot topic as of late": > > > https://lists.apache.org/thread.html/af52e2a3e865c01596d46374e8b294f2740587dbd59d85e132429b6c@%3Cbuilds.apache.org%3E > > > > (I didn't know this list -- bui...@apache.org -- existed, by the way) > > > > Regards > > &g

Re: [DRAFT] Apache Arrow Board Report - October 2019

2019-10-10 Thread Jacques Nadeau
Arg... accidental send before ready. What do think about the statement below for community health? Does it fairly capture the concerns/perspective? On Thu, Oct 10, 2019 at 10:24 AM Jacques Nadeau wrote: > Many contributors are struggling with the slowness of pre-commit CI. Arrow > has a

Re: [DRAFT] Apache Arrow Board Report - October 2019

2019-10-10 Thread Jacques Nadeau
Proposed report update below. LMK your thoughts. ## Description: The mission of Apache Arrow is the creation and maintenance of software related to columnar in-memory processing and data interchange ## Issues: * We are struggling with Continuous Integration scalability as the project has defin

Re: [DRAFT] Apache Arrow Board Report - October 2019

2019-10-10 Thread Jacques Nadeau
Antoine, is my synopsis fair? On Thu, Oct 10, 2019 at 12:53 PM Wes McKinney wrote: > +1 > > On Thu, Oct 10, 2019, 2:12 PM Jacques Nadeau wrote: > > > Proposed report update below. LMK your thoughts. > > > > ## Description: > > The mission of Apache Arro

Re: [DRAFT] Apache Arrow Board Report - October 2019

2019-10-10 Thread Jacques Nadeau
wrote: > > It's good with me. > > Regards > > Antoine. > > > Le 10/10/2019 à 22:51, Jacques Nadeau a écrit : > > Antoine, is my synopsis fair? > > > > On Thu, Oct 10, 2019 at 12:53 PM Wes McKinney > wrote: > > > >>

Re: [Discuss][FlightRPC] Extensions to Flight: "DoBidirectional"

2019-10-15 Thread Jacques Nadeau
I like it. Added some comments to the doc. Might worth discussion here depending on your thoughts. On Tue, Oct 15, 2019 at 7:11 AM David Li wrote: > Hey Ryan, > > Thanks for the comments. > > Concrete example: I've edited the doc to provide a Python strawman. > > Sync vs async: while I don't tou

Re: [Discuss][FlightRPC] Extensions to Flight: "DoBidirectional"

2019-10-16 Thread Jacques Nadeau
round > for quite a while). > > Thanks, > David > > On 10/15/19, Jacques Nadeau wrote: > > I like it. Added some comments to the doc. Might worth discussion here > > depending on your thoughts. > > > > On Tue, Oct 15, 2019 at 7:11 AM David Li wrote: > &g

Re: [Discuss][FlightRPC] Extensions to Flight: "DoBidirectional"

2019-10-20 Thread Jacques Nadeau
ant metadata field, but oneof prevents that from happening, and > overall having a clear separation between data and control messages is > cleaner. > > As for using Protobuf's Any: so far, we've refrained from exposing > Protobuf by using bytes, would we want to change that

Re: [Rust] DataFusion benchmarks

2019-10-20 Thread Jacques Nadeau
Super cool. Thanks for sharing! On Sun, Oct 20, 2019 at 10:52 AM Andy Grove wrote: > Now that the DataFusion query execution code has been re-written to use a > physical query plan with support for multi-threaded execution, I have > started running some benchmarks again. Here are the results so

Re: [DISCUSS][Java] Builders for java classes

2019-10-27 Thread Jacques Nadeau
+1 on the idea of enhancing builder interfaces. >>IntVectorBuilder addAll(int[] values); Let's make sure that anything like the above is efficient. People will judge the quality of the project on the efficiency of the methods we provide. If everybody starts using int[] to build Arrow vectors, we

Re: [Discuss][FlightRPC] Extensions to Flight: "DoBidirectional"

2019-11-27 Thread Jacques Nadeau
; > Knowledge of protobuf shouldn't be required to use Flight. > >>> > > > > >>> > > > Regards > >>> > > > > >>> > > > Antoine. > >>> > > > > >>> > > > > >>

Re: [Discuss][FlightRPC] Extensions to Flight: "DoBidirectional"

2019-11-28 Thread Jacques Nadeau
require multiple calls and coordination > with the deployment topology) in order to accomplish this? > > Best, > David > > On 11/27/19, Jacques Nadeau wrote: > > Fair enough. I'm okay with the bytes approach and the proposal looks good > > to me. > >

Re: [VOTE] Adopt Arrow in-process C Data Interface specification

2019-12-06 Thread Jacques Nadeau
-1 (binding) I'm voting -1 on this. I posted the thinking why on the PR. The high-level is that I think it needs to better address the pipelined use case as right now it fails to support that at all and has too much weight to ignore that use case. I actually would have posted it here but totally

Re: [Discuss][FlightRPC] Extensions to Flight: "DoBidirectional"

2019-12-13 Thread Jacques Nadeau
ecord batches with different schemas > > in the same stream, though with some added complexity on each side > > > > On Thu, Nov 28, 2019 at 10:37 AM Jacques Nadeau > wrote: > >> > >> I'd vote for explicitly not supported. We should keep our primitives

Re: Planned Support for ORC Dataset?

2019-12-13 Thread Jacques Nadeau
I question the value of adding the Orc format. The format is fragmented with the main tool writing it (hive) writing a version of the format (acid v2) that can't be consumed by systems that only use the Orc libraries (since they don't support acid). If you want to consume that data, you have to dep

Re: Planned Support for ORC Dataset?

2019-12-13 Thread Jacques Nadeau
ri, Dec 13, 2019 at 11:15 AM Jacques Nadeau wrote: > I question the value of adding the Orc format. The format is fragmented > with the main tool writing it (hive) writing a version of the format (acid > v2) that can't be consumed by systems that only use the Orc libraries > (si

Re: [DISCUSS] C Data Interface, take 2

2019-12-21 Thread Jacques Nadeau
Thanks for addressing my comments. I'm actively reviewing the proposal. It is taking me more time than I would like given the time of the year but I want to make sure that you know that I'm looking at it and hope to provide additional feedback beyond that which I've provided thus far on the PR. Wil

Re: Looking to 1.0

2020-01-03 Thread Jacques Nadeau
I identified three things in the java library that I think are top of mind and should be fixed before 1.0 to avoid weird incompatibility changes in the java apis (technical debt). I've tagged them as pre-1.0 as I don't exactly see what is the right way to tag/label a target release for a ticket. ht

Re: Looking to 1.0

2020-01-04 Thread Jacques Nadeau
> Liya Fan > > On Sat, Jan 4, 2020 at 7:16 AM Jacques Nadeau wrote: > > > I identified three things in the java library that I think are top of > mind > > and should be fixed before 1.0 to avoid weird incompatibility changes in > > the java apis (technical debt).

Re: Human-readable version of Arrow Schema?

2020-01-04 Thread Jacques Nadeau
What do people think about using the C interface representation? On Sun, Dec 29, 2019 at 12:42 PM Micah Kornfield wrote: > I opened https://github.com/google/flatbuffers/issues/5688 to try to get > some clarity. > > On Tue, Dec 24, 2019 at 12:13 PM Wes McKinney wrote: > > > On Tue, Dec 24, 2019

Re: Human-readable version of Arrow Schema?

2020-01-04 Thread Jacques Nadeau
I guess we'd still need to introduce a way to nest, it only has type representation. On Sat, Jan 4, 2020 at 2:16 PM Jacques Nadeau wrote: > What do people think about using the C interface representation? > > On Sun, Dec 29, 2019 at 12:42 PM Micah Kornfield > wrote: &g

Re: Pending Java pull requests

2020-01-09 Thread Jacques Nadeau
I think there are a decent chunk that are of questionable value. We need to be more willing to simply reject requests rather than leave them in no-man's land. I'll try to do a pass through and help dispatch, etc. On Thu, Jan 9, 2020 at 5:25 AM Krisztián Szűcs wrote: > Hi, > > Roughly 40% of the

  1   2   3   4   5   >