from:"Jacques Nadeau"

Re: [DISCUSS] Donation of a Spark native engine based on DataFusion & Arrow

2024-01-24 Thread Jacques Nadeau

our valuable comments. > > Best, > Chao > > On Thu, Jan 18, 2024 at 5:24 PM Jacques Nadeau wrote: > > > > Yes, that was roughly what I was requesting (I was suggesting a single PR > > with many commits that would be merged with the history). > > > > It&

Re: [DISCUSS] Donation of a Spark native engine based on DataFusion & Arrow

2024-01-18 Thread Jacques Nadeau

idely used internally > already), we'd be happy to help further improve readability & > maintainability of the codebase and resolving issues raised from the > community. Will this work for you? really appreciate if you understand > our situation. > > Thanks, > Chao >

Re: [DISCUSS] Donation of a Spark native engine based on DataFusion & Arrow

2024-01-17 Thread Jacques Nadeau

e of them contain internal info which we need to > remove upon open sourcing. How about we just add a summary in the PR > itself, and add everyone that has contributed to it as co-author to > the PR? > > Chao > > On Wed, Jan 17, 2024 at 11:09 AM Jacques Nadeau > wrote: > &

Re: [DISCUSS] Donation of a Spark native engine based on DataFusion & Arrow

2024-01-17 Thread Jacques Nadeau

Hey Chao, it would be great for you to share the code some place with commit history. (PR to the repo that Andy made or something else.) On Mon, Jan 15, 2024 at 7:38 AM Andy Grove wrote: > Hi Chao, > > I have created https://github.com/apache/arrow-datafusion-comet and you > should be able to cr

Re: [VOTE] Substrait for Flight SQL

2022-09-08 Thread Jacques Nadeau

My vote continues to be +1 On Thu, Sep 8, 2022 at 11:44 AM Neal Richardson wrote: > +1 > > Neal > > On Thu, Sep 8, 2022 at 2:15 PM Ashish wrote: > > > +1 (non-binding) > > > > On Thu, Sep 8, 2022 at 9:41 AM Gavin Ray wrote: > > > > > Oh, so that's what "non-binding" means in vote threads > > >

Re: [VOTE] Substrait for Flight SQL

2022-08-31 Thread Jacques Nadeau

+1 (binding) On Wed, Aug 31, 2022, 5:15 PM Larry White wrote: > +1 (non-binding) > > On Wed, Aug 31, 2022 at 7:55 PM Vinicius Fraga wrote: > > > +1 (non-binding) > > > > On Wed, 31 Aug 2022, 20:51 David Li, wrote: > > > > > Hello, > > > > > > I am proposing to extend the Flight SQL specificati

Re: ARROW-11465

2022-05-18 Thread Jacques Nadeau

I second Weston's comments. The idea of separate files is part of the de jure spec but not the de facto one. It's up to the parquet community whether the de facto spec should be "altered" . Afaik, zero oss readers support use of this field. On Wed, May 18, 2022, 8:53 AM Weston Pace wrote: > I

Re: [Rust] Enable GitHub discussions for Rust projects?

2022-05-04 Thread Jacques Nadeau

No vote here but a little feedback. We've generally found Github Discussions somewhat lacking in Substrait. If other people find it good, great. I might be more inclined to just drive people to something like StackOverflow or the mailing list. We were initially quite enthusiastic but the experience

Re: [Discuss][Format] Add 32-bit and 64-bit Decimals

2022-04-23 Thread Jacques Nadeau

I'm generally -0.01 against narrow decimals. My experience in practice has been that widening happens so quickly that they are little used and add unnecessary complexity. For reference, the original Arrow code actually implemented Decimal9 [1] and Decimal18 [2] but we removed both because of this e

Re: [C++] output field names in Arrow Substrait

2022-04-23 Thread Jacques Nadeau

In the specification, there are both read and intermediate write rels. No one has implemented the protobuf yet for write. Both carry field names. The names of fields is an internal rel node concern just like condition is for filter. This is because many formats require names. For example, parquet

Re: [DISCUSS] "Naming" the Arrow C++ execution engine subproject?

2022-04-18 Thread Jacques Nadeau

I'm -0.9 on Arrow Compute engine. It makes it sound like it is THE canonical Arrow one, second classing Datafusion and Gandiva. No strong feelings on other names. Naming in general is an extremely subjective process... On Thu, Mar 31, 2022, 2:33 PM Weston Pace wrote: > I'm +1 for "arrow compu

Re: [FlightSQL] Structured/Serialized representation of query (like JSON) rather than SQL string possible?

2022-03-03 Thread Jacques Nadeau

James, I agree that you could use JSON but that feels a bit hacky (mis-use of the paradigm). Instead, I'd really like to do something like David is suggesting: support Substrait as an alternative to a SQL string. Something like this: https://github.com/jacques-n/arrow/commit/e22674fa882e77c2889cf95

Re: [DISCUSS] Annual rotation of Arrow PMC chair

2022-01-04 Thread Jacques Nadeau

Hey Wes, thanks for bringing this up. And more importantly, thanks for working as the PMC chair this last year! I think Kouhei would be a great choice for the PMC chair. Jacques On Tue, Jan 4, 2022 at 12:44 AM Wes McKinney wrote: > hello all, > > As we discussed at the end of 2020 [1], we woul

Re: [RESULT][VOTE] Proposed addition to Arrow Flight: Arrow Flight RPC

2021-12-25 Thread Jacques Nadeau

That's great news. Congrats and thanks to the team who worked on it. This is a great addition to Arrow! On Thu, Dec 23, 2021, 11:26 AM David Li wrote: > The integration tests and existing PRs were merged into a separate branch. > We also merged in a few build fixes during final review. Just in t

Re: [DISCUSS] Adding new columnar memory layouts to Arrow (in-memory, IPC, C ABI)

2021-12-10 Thread Jacques Nadeau

I'm strongly in support of much of this. Thanks for bringing this up. It is long overdue. On initial read, my thoughts would be: Stongly inclined: - String view - constant view Weakly inclined - All null - rle Somewhat disinclined - Sequence change With dictionary and string view, I feel like

[DISCUSS][FLIGHT SQL] Intentions around JDBC and/or ODBC for Flight SQL?

2021-12-09 Thread Jacques Nadeau

Hey all, I was curious if there was anyone planning on implementing JDBC and/or ODBC wrappers on top of the Flight SQL Java [1] and Flight SQL C++ implementations [2] since they seem to be completing soon. It seems like JDBC/ODBC could quickstart integration between Flight SQL and other components

Re: Question about Arrow Mutable/Immutable Arrays choice

2021-11-03 Thread Jacques Nadeau

Hey Alessandro, take a look at the top level docs on ValueVector: https://arrow.apache.org/docs/java/reference/org/apache/arrow/vector/ValueVector.html Specifically the following: - values need to be written in order (e.g. index 0, 1, 2, 5) - null vectors start with all values as null befo

Re: Re: Re: [DISCUSS][Java] Adding GC-Based reference management strategy for buffers

2021-10-22 Thread Jacques Nadeau

ct control. The proposal (actually, its 3rd iteration) is > > described here at https://openjdk.java.net/jeps/393, and has been > available > > as an incubator feature for several JDK releases (Javadoc: > > > https://docs.oracle.com/en/java/javase/17/docs/api/jdk.incubat

Re: Re: Re: [DISCUSS][Java] Adding GC-Based reference management strategy for buffers

2021-10-07 Thread Jacques Nadeau

Clearly this patch was driven by an implicit set of needs but it's hard to guess at what they are. As Laurent asked, what is the main goal here? There may be many ways to solve this goal. Some thoughts in general: - The allocator is a finely tuned piece of multithreaded machinery that is used on h

Re: [Rust] Heads up: RUSTSEC security advisory against arrow-rs

2021-09-30 Thread Jacques Nadeau

In the past I was dealing with something similar. My experience was when data was accepted at the edge, the cost of validating that the first offset is zero, the last is within the data bounds and that all others are equal or increasing was a reasonable overhead associated with validating offsets f

Re: [DISCUSS][Rust] Biweekly sync call for arrow/datafusion again?

2021-09-20 Thread Jacques Nadeau

+1 on time variation. Please add me to to the invite. Thanks On Sun, Sep 19, 2021 at 9:49 PM Benson Muite wrote: > New to this. A suggestion may be to consider two of the times, eg. 4:00 > UTC and 16:00 UTC perhaps alternating allowing geographic diversity in > joining convenience. > > On 9/20/

Re: [DISCUSS] Developing an "Arrow Compute IR [Intermediate Representation]" to decouple language front ends from Arrow-native compute engines

2021-09-07 Thread Jacques Nadeau

ed anything close to physical-plan things, > but > > I > > > > > think that's a good follow up PR. Having separate representations > for > > > > > logical/physical plans seems like a waste of effort ultimately. I > > think > >

Re: [DISCUSS] Developing an "Arrow Compute IR [Intermediate Representation]" to decouple language front ends from Arrow-native compute engines

2021-08-23 Thread Jacques Nadeau

In a lucky turn of events, Phillip actually turned out to be in my neck of the woods on Friday so we had a chance to sit down and discuss this. To help, I actually shared something I had been working on a few months ago independently (before this discussion started). For reference: Wes PR: https:/

Re: [C++] Shall we modify the ORC reader?

2021-01-10 Thread Jacques Nadeau

I don't think 1 & 2 make sense. I don't think there are a lot of users reading 2gb strings or lists with 2B objects in them. Saying we just don't support that pattern seems fine for now. I also believe the string and list types have better cross-language support than the large variants. On Sun, Ja

Re: [Governance] [Proposal] Stop force-pushing to PRs after release?

2020-11-25 Thread Jacques Nadeau

> > I don’t have a problem with releasing out of branches. I think I (or > someone) proposed this in the past and there was not consensus but it seems > like a good time to revisit the issue. > Thanks for the recap. I just couldn't remember where people were at on this. I'm a big +1 for releasing

Re: [Governance] [Proposal] Stop force-pushing to PRs after release?

2020-11-25 Thread Jacques Nadeau

I'm catching up here. A couple questions. - I don't think we should require the inclusion of the release commits in the main branch. Having leafs created right before release seems to simplify this and resolve any issues around force PRs, no? Or maybe I'm misunderstanding something? Ma

[ANNOUNCE] New Arrow PMC chair: Wes McKinney

2020-10-23 Thread Jacques Nadeau

I am pleased to announce that we have a new PMC chair and VP as per our newly started tradition of rotating the chair once a year. I have resigned and Wes was duly elected by the PMC and approved unanimously by the board. Please join me in congratulating Wes! Jacques

Re: October board report for Arrow

2020-10-11 Thread Jacques Nadeau

Hey all, with the focus on the PMC chair rotation discussion, we have a pretty thin report this month. I've added a few comments in the doc Wes posted. It would be great if others provided additional modifications: https://docs.google.com/document/d/1ir2PB1Yk3groGqZr14tJZr29KtGshB974rdryO2EOKU/edi

Re: [VOTE][Format] Allow for 256-bit Decimal's in the Arrow specification

2020-09-29 Thread Jacques Nadeau

+1 On Tue, Sep 29, 2020 at 11:19 AM Wes McKinney wrote: > +1 > > On Tue, Sep 29, 2020 at 4:07 AM Fan Liya wrote: > > > > +1 > > > > Best, > > Liya Fan > > > > On Tue, Sep 29, 2020 at 4:55 PM Antoine Pitrou > wrote: > > > > > > > > +1 (binding) > > > > > > I didn't look at the implementation. >

Re: [DISCUSS] Rotating the PMC Chair

2020-09-29 Thread Jacques Nadeau

I'm super supportive of this, Julian. Thanks for bringing it up. Unlike some leaders, I'm even happy to guarantee a peaceful transition of power! Re now vs Feb 17: I'm totally open to either. In general, I'm a do it now kind of person so if others think a slightly longer tenure sounds good, we co

Re: [DISCUSS][Java] Support non-nullable vectors

2020-09-10 Thread Jacques Nadeau

e/arrow/pull/8147 > > On Fri, Mar 13, 2020 at 9:47 PM Fan Liya wrote: > >> Hi Jacques, >> >> Thanks a lot for your valuable comments. >> >> I agree with you that collapsing nullable and non-nullable >> implementations is a good idea, and it does not cont

Re: [DISCUSS] Big Endian support in Arrow (was: Re: [Java] Supporting Big Endian)

2020-08-31 Thread Jacques Nadeau

And yes, for those of you looking closely, I commented on ARROW-245 when it was committed. I just forgot about it. It looks like I had mostly the same concerns then that I do now :) Now I'm just more worried about format sprawl... On Mon, Aug 31, 2020 at 1:30 PM Jacques Nadeau wrote: >

Re: [DISCUSS] Big Endian support in Arrow (was: Re: [Java] Supporting Big Endian)

2020-08-31 Thread Jacques Nadeau

> > What do you mean? The Endianness field (a Big|Little enum) was added 4 > years ago: > https://issues.apache.org/jira/browse/ARROW-245 I didn't realize that was done, my bad. Good example of format rot from my pov.

Re: [DISCUSS] Big Endian support in Arrow (was: Re: [Java] Supporting Big Endian)

2020-08-30 Thread Jacques Nadeau

mplementation > > working > > > > > fully locally? How many additional PRs will be needed and what do > > > > > they look like (I think there already a few more in the queue)? > > > > > > > > > > * Will it introduce performance regressi

Re: Gandiva and Threads

2020-08-14 Thread Jacques Nadeau

@ravin...@dremio.com @prav...@dremio.com thoughts? On Tue, Jul 28, 2020 at 3:39 PM Wes McKinney wrote: > Perhaps Gandiva does not handle sliced arrays properly? This would be > worth investigating > > On Mon, Jul 27, 2020 at 7:43 PM Matt Youill > wrote: > > > > Managed to track down the issue

Re: [DISCUSS] How to extended time value range for Timestamp type?

2020-08-14 Thread Jacques Nadeau

+1, let's be cautious adding these kinds of things. On Wed, Aug 5, 2020 at 5:49 AM Wes McKinney wrote: > I also am not sure there is a good case for a new built-in type since it > introduces a good deal of complexity, particularly when there is the > extension type option. We’ve been living with

Re: [DISCUSS] Support of higher bit-width Decimal type

2020-08-14 Thread Jacques Nadeau

Do we have a good definition of what is necessary to add a new data type? Adding a type but not pulling it through most of the code seems less than ideal since it means one part of Arrow doesn't work with another (providing a less optimal end-user experience). For example, would this work include

Re: [DISSCUSS][JAVA] Avoid set reader/writer indices in FieldVector#getFieldBuffers

2020-08-14 Thread Jacques Nadeau

Per my comments there, the introduction of field buffers was added as part of the fieldvector addition when we have vectors that weren't field level. This meant that getbuffers and getfieldbuffers were at different levels at hierarchy (getbuffers being more general). I believe we no longer have the

Re: [DISCUSS] Adding a pull-style iterator API to the C data interface

2020-08-14 Thread Jacques Nadeau

I think this unlocks a bunch of use cases. I think people are generally using Arrow in simpler, non-streaming ways right now and thus the quiet. Producing an iterator pattern is logical as you move to streams of smaller chunks (common in distributed and multi-tenant systems). On Mon, Aug 10, 2020

Re: [Java] Supporting Big Endian

2020-08-14 Thread Jacques Nadeau

Hey Micah, thanks for starting the discussion. I just skimmed that thread and it isn't entirely clear that there was a conclusion that the overhead was worth it. I think everybody agrees that it would be nice to have the code work on both platforms. On the flipside, the code noise for a rare case

Re: [ext] Re: language independent representation of filter expressions

2020-07-24 Thread Jacques Nadeau

lds are > typed when I think fields should just contain a field name. > > -Original Message- > From: Jacques Nadeau > Sent: Thursday, July 23, 2020 10:14 PM > To: dev > Subject: [ext] Re: language independent representation of filter > expressions > > Have you

Re: language independent representation of filter expressions

2020-07-23 Thread Jacques Nadeau

atbuffer schema enum values). > > > On 2020/07/13 09:21:19, Antoine Pitrou wrote: > > On Sat, 11 Jul 2020 09:55:16 -0700 > > Jacques Nadeau wrote: > > > > > > I'm against extending use of flatbuf within Arrow. The language > support is > > >

Re: [DISCUSS] Using direct memory size as a limit of populated off-heap buffers in Java

2020-07-23 Thread Jacques Nadeau

I'd like to simplify this discussion and start with clarity of use case. If we're talking about a Java developer using the datasets API in a java application, we should respect the Java direct memory size limits set via -XX:MaxDirectMemorySize. Doing something else would violate the principle of le

Re: Writing very large rowgroups to Apache Parquet

2020-07-17 Thread Jacques Nadeau

he last are expected to be at least 5mb if I read their docs correctly >> [1]) >> >> [1] https://docs.aws.amazon.com/AmazonS3/latest/dev/qfacts.html >> >> >> On Saturday, July 11, 2020, Jacques Nadeau wrote: >> >> > I'd suggest a new write pattern. Wr

Re: Writing very large rowgroups to Apache Parquet

2020-07-11 Thread Jacques Nadeau

I'd suggest a new write pattern. Write the columns page at a time to separate files then use a second process to concatenate the columns and append the footer. Odds are you would do better than os swapping and take memory requirements down to page size times field count. In s3 I believe you could

Re: language independent representation of filter expressions

2020-07-11 Thread Jacques Nadeau

For reference, the doc (from eight years ago) I meant to link in my initial message was: https://docs.google.com/document/d/1QTL8warUYS2KjldQrGUse7zp8eA72VKtLOHwfXy6c7I/edit On Sat, Jul 11, 2020, 11:24 AM Wes McKinney wrote: > On Sat, Jul 11, 2020 at 11:55 AM Jacques Nadeau > wrote: >

Re: language independent representation of filter expressions

2020-07-11 Thread Jacques Nadeau

On Mon, Jul 6, 2020 at 2:45 PM Wes McKinney wrote: > I would also be interested in having a reusable serialized format for > filter- and projection-like expressions. I think trying to go so far > as full logical query plans suitable for building a SQL engine is > perhaps a bit too far but we coul

Re: Renaming master branch, removing blacklist/whitelist

2020-06-24 Thread Jacques Nadeau

Hi Suvayu, thanks for sharing your experiences. Clearly we have work to do. Wrt to specific name changes, I agree with Wes. If something is negative to a non-trivial portion of the population, why not use something that avoids that issue where possible. On Fri, Jun 19, 2020, 7:44 PM Suvayu Ali

Re: [DISCUSS] Removing top-level validity bitmap from Union type

2020-06-24 Thread Jacques Nadeau

Per my comments on the pr, I also think this is preferred. I believe we will avoid the potential for validity inconsistency and simplify construction of union data in most cases. On Wed, Jun 24, 2020, 7:58 AM Wes McKinney wrote: > hi folks, > > As discussed on the recent GitHub PR [1], as a mean

Re: Arrow Flight connector for SQL Server

2020-05-19 Thread Jacques Nadeau

Hey Brendan, Welcome to the community. At Dremio we've exposed flight as an input and output for sql result datasets. I'll have one of our guys share some details. I think a couple questions we've been struggling with include how to standardize additional metadata operations, what should the prepa

Re: [DISCUSS][Java] Support non-nullable vectors

2020-03-11 Thread Jacques Nadeau

Generally Ive found that this isnt an important optimization in the use cases we see. Memory overhead, especially with our Java shared allocation scheme is nominal. Optimizing null checks at the word level usually is much more impactful since non null and null runs are much more common on a shorter

Re: JDBC / Flight questions

2020-01-29 Thread Jacques Nadeau

At Dremio we have two things at the moment: A JDBC driver that is built on Arrow and served as the inspiration for some of the design choices in flight [1] A preview flight connector that doesn't yet expose JDBC [2] These the former is built on Avatica (part of the Apache Calcite project) so the

Re: [Format] Array/RowBatch filters

2020-01-26 Thread Jacques Nadeau

At Dremio, we use four main types of selection vector/bitmaps: Dense Format (record valid or not, no ordering) - single bit (bitmap) Sparse formats (identifies valid records as well as their order) - 2 byte (for record batches up to 2^16 records). - 4 byte (for 2^16 batches of 2^16 records); - 6

Re: [DISCUSS] C Data Interface, take 2

2020-01-21 Thread Jacques Nadeau

. If you want to try to > block the C++ contributors from doing this we may be barreling toward > a governance crisis in the project. I'm stepping back from this > discussion for a time now to allow others to catch up on the > discussion and to weigh in as needed > > On Mon,

Re: [DISCUSS] C Data Interface, take 2

2020-01-20 Thread Jacques Nadeau

t; C++ project (i.e. the ArrayData data structure). We should not > advertise this as being a part of the project specification. > > - Wes > > On Mon, Jan 20, 2020 at 11:51 AM Jacques Nadeau > wrote: > > > > As I noted on the pull request, I think fundamentally th

Re: [Format] Make fields required?

2020-01-20 Thread Jacques Nadeau

> > I think what we have determined is that the changes that are being > discussed in this thread would not render any existing serialized > Flatbuffers unreadable, unless they are malformed / unable to be > read with the current libraries. > I think we need to separate two different things: Poin

Re: [DISCUSS] C Data Interface, take 2

2020-01-20 Thread Jacques Nadeau

to merely providing a C-header-based data > interface to the C++ project only. That was the original problem > statement and it seems in attempting to make it useful beyond C++ has > made it difficult to reach consensus. > > Thanks > Wes > > On Sat, Dec 21, 2019 at 4:38 PM

Re: [Format] Make fields required?

2020-01-20 Thread Jacques Nadeau

> > To be clear, I agree that we need to check that our various validation > and integration suites pass properly. But once that is done and > assuming all the metadata variations are properly tested, data > variations should not pose any problem. > Unless I'm misunderstanding your proposal, that

Re: [Format] Make fields required?

2020-01-20 Thread Jacques Nadeau

I think it is too late in the game to make this fundamental change. It would be very hard to assess whether it is no op or has massive implications to existing datasets. Just among Dremio customers in the 30 days we stored more than 100mm datasets that leveraged the current format. I'm supportive

Re: [Java] Large Memory Allocators (Taking a dependency on JNA?)

2020-01-19 Thread Jacques Nadeau

It seems like jna is overkill & unnecessary for simply allocating/freeing memory. A simple way to do this is either to use unsafe directly or call the existing netty unsafe facade directly. PlatformDependent.allocateMemory(long) PlatformDependent.freeMemory(long) Should be relatively straightfor

[jira] [Created] (ARROW-7549) [Java] Reorganize Flight modules to keep top level clean/organized

2020-01-10 Thread Jacques Nadeau (Jira)

Jacques Nadeau created ARROW-7549: - Summary: [Java] Reorganize Flight modules to keep top level clean/organized Key: ARROW-7549 URL: https://issues.apache.org/jira/browse/ARROW-7549 Project: Apache

Re: Timeline for next major release [was Re: Looking to 1.0]

2020-01-09 Thread Jacques Nadeau

I think we should try to be more conservative about what > issues we pre-emptively assign fix versions -- there may be a more > constructive way that we can prioritize issues and distinguish between > "optimistic" / nice-to-have issues and "must do to release" issu

Re: Timeline for next major release [was Re: Looking to 1.0]

2020-01-09 Thread Jacques Nadeau

er > wrote: > > > > > > > I agree on a 0.16.0 release. In the meantime I'll try to help out > with > > > > getting the Java side ready for 1.0. > > > > > > > > On Sat, Jan 4, 2020 at 7:21 PM Fan Liya > wrote: > > > > > > > >

[jira] [Created] (ARROW-7534) Create a new java/contrib module

2020-01-09 Thread Jacques Nadeau (Jira)

Jacques Nadeau created ARROW-7534: - Summary: Create a new java/contrib module Key: ARROW-7534 URL: https://issues.apache.org/jira/browse/ARROW-7534 Project: Apache Arrow Issue Type: Task

[jira] [Created] (ARROW-7533) [Java] Move ArrowBufPointer out of the java the memory package

2020-01-09 Thread Jacques Nadeau (Jira)

Jacques Nadeau created ARROW-7533: - Summary: [Java] Move ArrowBufPointer out of the java the memory package Key: ARROW-7533 URL: https://issues.apache.org/jira/browse/ARROW-7533 Project: Apache Arrow

Re: [DRAFT] Apache Arrow Board Report January 2020

2020-01-09 Thread Jacques Nadeau

Posted with correction. Thanks to Wes, Antoine and Todd! On Wed, Jan 8, 2020 at 10:15 AM Wes McKinney wrote: > Not sure what happened there. The two words after "grow" can be removed > > ## Description: > > The mission of Apache Arrow is the creation and maintenance of software > related > to co

Re: Pending Java pull requests

2020-01-09 Thread Jacques Nadeau

I think there are a decent chunk that are of questionable value. We need to be more willing to simply reject requests rather than leave them in no-man's land. I'll try to do a pass through and help dispatch, etc. On Thu, Jan 9, 2020 at 5:25 AM Krisztián Szűcs wrote: > Hi, > > Roughly 40% of the

Re: Human-readable version of Arrow Schema?

2020-01-04 Thread Jacques Nadeau

I guess we'd still need to introduce a way to nest, it only has type representation. On Sat, Jan 4, 2020 at 2:16 PM Jacques Nadeau wrote: > What do people think about using the C interface representation? > > On Sun, Dec 29, 2019 at 12:42 PM Micah Kornfield > wrote: &g

Re: Human-readable version of Arrow Schema?

2020-01-04 Thread Jacques Nadeau

What do people think about using the C interface representation? On Sun, Dec 29, 2019 at 12:42 PM Micah Kornfield wrote: > I opened https://github.com/google/flatbuffers/issues/5688 to try to get > some clarity. > > On Tue, Dec 24, 2019 at 12:13 PM Wes McKinney wrote: > > > On Tue, Dec 24, 2019

Re: Looking to 1.0

2020-01-04 Thread Jacques Nadeau

> Liya Fan > > On Sat, Jan 4, 2020 at 7:16 AM Jacques Nadeau wrote: > > > I identified three things in the java library that I think are top of > mind > > and should be fixed before 1.0 to avoid weird incompatibility changes in > > the java apis (technical debt).

Re: Looking to 1.0

2020-01-03 Thread Jacques Nadeau

I identified three things in the java library that I think are top of mind and should be fixed before 1.0 to avoid weird incompatibility changes in the java apis (technical debt). I've tagged them as pre-1.0 as I don't exactly see what is the right way to tag/label a target release for a ticket. ht

[jira] [Created] (ARROW-7495) [Java] Remove "empty" concept from ArrowBuf, replace with custom referencemanager

2020-01-03 Thread Jacques Nadeau (Jira)

Jacques Nadeau created ARROW-7495: - Summary: [Java] Remove "empty" concept from ArrowBuf, replace with custom referencemanager Key: ARROW-7495 URL: https://issues.apache.org/jira/browse/

[jira] [Created] (ARROW-7494) [Java] Remove reader index and writer index from ArrowBuf

2020-01-03 Thread Jacques Nadeau (Jira)

Jacques Nadeau created ARROW-7494: - Summary: [Java] Remove reader index and writer index from ArrowBuf Key: ARROW-7494 URL: https://issues.apache.org/jira/browse/ARROW-7494 Project: Apache Arrow

Re: [DISCUSS] C Data Interface, take 2

2019-12-21 Thread Jacques Nadeau

Thanks for addressing my comments. I'm actively reviewing the proposal. It is taking me more time than I would like given the time of the year but I want to make sure that you know that I'm looking at it and hope to provide additional feedback beyond that which I've provided thus far on the PR. Wil

Re: Planned Support for ORC Dataset?

2019-12-13 Thread Jacques Nadeau

ri, Dec 13, 2019 at 11:15 AM Jacques Nadeau wrote: > I question the value of adding the Orc format. The format is fragmented > with the main tool writing it (hive) writing a version of the format (acid > v2) that can't be consumed by systems that only use the Orc libraries > (si

Re: Planned Support for ORC Dataset?

2019-12-13 Thread Jacques Nadeau

I question the value of adding the Orc format. The format is fragmented with the main tool writing it (hive) writing a version of the format (acid v2) that can't be consumed by systems that only use the Orc libraries (since they don't support acid). If you want to consume that data, you have to dep

Re: [Discuss][FlightRPC] Extensions to Flight: "DoBidirectional"

2019-12-13 Thread Jacques Nadeau

ecord batches with different schemas > > in the same stream, though with some added complexity on each side > > > > On Thu, Nov 28, 2019 at 10:37 AM Jacques Nadeau > wrote: > >> > >> I'd vote for explicitly not supported. We should keep our primitives

Re: [VOTE] Adopt Arrow in-process C Data Interface specification

2019-12-06 Thread Jacques Nadeau

-1 (binding) I'm voting -1 on this. I posted the thinking why on the PR. The high-level is that I think it needs to better address the pipelined use case as right now it fails to support that at all and has too much weight to ignore that use case. I actually would have posted it here but totally

Re: [Discuss][FlightRPC] Extensions to Flight: "DoBidirectional"

2019-11-28 Thread Jacques Nadeau

require multiple calls and coordination > with the deployment topology) in order to accomplish this? > > Best, > David > > On 11/27/19, Jacques Nadeau wrote: > > Fair enough. I'm okay with the bytes approach and the proposal looks good > > to me. > >

Re: [Discuss][FlightRPC] Extensions to Flight: "DoBidirectional"

2019-11-27 Thread Jacques Nadeau

; > Knowledge of protobuf shouldn't be required to use Flight. > >>> > > > > >>> > > > Regards > >>> > > > > >>> > > > Antoine. > >>> > > > > >>> > > > > >>

[jira] [Created] (ARROW-7198) [Java] Allow a user to provide an alternative "chunk" allocator

2019-11-17 Thread Jacques Nadeau (Jira)

Jacques Nadeau created ARROW-7198: - Summary: [Java] Allow a user to provide an alternative "chunk" allocator Key: ARROW-7198 URL: https://issues.apache.org/jira/browse/ARROW-7198 Proje

Re: [DISCUSS][Java] Builders for java classes

2019-10-27 Thread Jacques Nadeau

+1 on the idea of enhancing builder interfaces. >>IntVectorBuilder addAll(int[] values); Let's make sure that anything like the above is efficient. People will judge the quality of the project on the efficiency of the methods we provide. If everybody starts using int[] to build Arrow vectors, we

Re: [Rust] DataFusion benchmarks

2019-10-20 Thread Jacques Nadeau

Super cool. Thanks for sharing! On Sun, Oct 20, 2019 at 10:52 AM Andy Grove wrote: > Now that the DataFusion query execution code has been re-written to use a > physical query plan with support for multi-threaded execution, I have > started running some benchmarks again. Here are the results so

Re: [Discuss][FlightRPC] Extensions to Flight: "DoBidirectional"

2019-10-20 Thread Jacques Nadeau

ant metadata field, but oneof prevents that from happening, and > overall having a clear separation between data and control messages is > cleaner. > > As for using Protobuf's Any: so far, we've refrained from exposing > Protobuf by using bytes, would we want to change that

Re: [Discuss][FlightRPC] Extensions to Flight: "DoBidirectional"

2019-10-16 Thread Jacques Nadeau

round > for quite a while). > > Thanks, > David > > On 10/15/19, Jacques Nadeau wrote: > > I like it. Added some comments to the doc. Might worth discussion here > > depending on your thoughts. > > > > On Tue, Oct 15, 2019 at 7:11 AM David Li wrote: > &g

Re: [Discuss][FlightRPC] Extensions to Flight: "DoBidirectional"

2019-10-15 Thread Jacques Nadeau

I like it. Added some comments to the doc. Might worth discussion here depending on your thoughts. On Tue, Oct 15, 2019 at 7:11 AM David Li wrote: > Hey Ryan, > > Thanks for the comments. > > Concrete example: I've edited the doc to provide a Python strawman. > > Sync vs async: while I don't tou

Re: [DRAFT] Apache Arrow Board Report - October 2019

2019-10-10 Thread Jacques Nadeau

wrote: > > It's good with me. > > Regards > > Antoine. > > > Le 10/10/2019 à 22:51, Jacques Nadeau a écrit : > > Antoine, is my synopsis fair? > > > > On Thu, Oct 10, 2019 at 12:53 PM Wes McKinney > wrote: > > > >>

Re: [DRAFT] Apache Arrow Board Report - October 2019

2019-10-10 Thread Jacques Nadeau

Antoine, is my synopsis fair? On Thu, Oct 10, 2019 at 12:53 PM Wes McKinney wrote: > +1 > > On Thu, Oct 10, 2019, 2:12 PM Jacques Nadeau wrote: > > > Proposed report update below. LMK your thoughts. > > > > ## Description: > > The mission of Apache Arro

Re: [DRAFT] Apache Arrow Board Report - October 2019

2019-10-10 Thread Jacques Nadeau

Proposed report update below. LMK your thoughts. ## Description: The mission of Apache Arrow is the creation and maintenance of software related to columnar in-memory processing and data interchange ## Issues: * We are struggling with Continuous Integration scalability as the project has defin

Re: [DRAFT] Apache Arrow Board Report - October 2019

2019-10-10 Thread Jacques Nadeau

Arg... accidental send before ready. What do think about the statement below for community health? Does it fairly capture the concerns/perspective? On Thu, Oct 10, 2019 at 10:24 AM Jacques Nadeau wrote: > Many contributors are struggling with the slowness of pre-commit CI. Arrow > has a

Re: [DRAFT] Apache Arrow Board Report - October 2019

2019-10-10 Thread Jacques Nadeau

rently CI capacity has been a "hot topic as of late": > > > https://lists.apache.org/thread.html/af52e2a3e865c01596d46374e8b294f2740587dbd59d85e132429b6c@%3Cbuilds.apache.org%3E > > > > (I didn't know this list -- bui...@apache.org -- existed, by the way) > > > > Regards > > &g

Re: [DRAFT] Apache Arrow Board Report - October 2019

2019-10-09 Thread Jacques Nadeau

sues. All that is needed is for INFRA to let us > use third party GitHub Apps and monitor any potentially destructive > actions that they may take, such as modifying unrelated repository > webhooks related to IP provenance. > > - Wes > > On Wed, Oct 9, 2019 at 9:33 PM Jacques Nadeau

Re: [DRAFT] Apache Arrow Board Report - October 2019

2019-10-09 Thread Jacques Nadeau

I think we need to more direct in listing issues for the board. What have we done? What do we want them to do? In general, any large org is going to be slow to add new deep integrations into GitHub. I don't think we should expect Apache to be any different (it took several years before we could m

Re: [DISCUSS] C-level in-process array protocol

2019-10-08 Thread Jacques Nadeau

party integrators? > > > - Flatbuffers aren't entirely straight-forward and I think if we do > move > > > forward with an API based on Column/Array we should consider > alternatives > > > as long as the necessary parsing code can be done in a small amount of > > code > > > (I'm personally a

Re: [Proposal]: Expose Flight gRPC for Dremio use case (Java)

2019-10-05 Thread Jacques Nadeau

> > Is it possible for a single gRPC server to expose multiple services > through the same port (it sounds like it is)? It would be a good idea > to do similar refactoring in C++ so that Flight RPC endpoints can be > provided alongside some other non-Flight endpoints in the same gRPC > server > It

Re: [DISCUSS] C-level in-process array protocol

2019-10-02 Thread Jacques Nadeau

oPB project which is an > implementation of Protocol Buffers with small code size > > https://github.com/nanopb/nanopb > > Let me know if this makes more sense. > > I think it's important to communicate clearly about this primarily for > the benefit of the outside w

Re: [DISCUSS] C-level in-process array protocol

2019-10-01 Thread Jacques Nadeau

I disagree with this statement: - the IPC format is meant for serialization while the C data protocol is meants for in-memory communication, so different concerns apply If that is how the a particular implementation presents it, that is a weaknesses of the implementation, not the format. The prim

Re: [DISCUSS][Java] Reduce the range of synchronized block when releasing an ArrowBuf

2019-09-29 Thread Jacques Nadeau

For others that don't realize, the discussion of this is happening on the pull request here: https://github.com/apache/arrow/pull/5526 On Fri, Sep 27, 2019 at 4:52 AM Fan Liya wrote: > Dear all, > > When releasing an ArrowBuf, we will run the following piece of code: > > private int decrement(i

Re: [DISCUSS] C-level in-process array protocol

2019-09-29 Thread Jacques Nadeau

On Sun, Sep 29, 2019 at 12:59 AM Antoine Pitrou wrote: > > Le 29/09/2019 à 06:10, Jacques Nadeau a écrit : > > * No dependency on Flatbuffers. > > * No buffer reassembly (data is already exposed in logical Arrow format). > > * Zero-copy by design. > > * Ea

Re: [DISCUSS] C-level in-process array protocol

2019-09-28 Thread Jacques Nadeau

[FieldNode]; buffers: [Buffer]; } On Sat, Sep 28, 2019 at 9:02 PM Jacques Nadeau wrote: > I'm not clear on why we need to introduce something beyond what > flatbuffers already provides. Can someone explain that to me? I'm not > really a fan of introducing a second represen

1 2 3 4 5 >

1 - 100 of 408 matches

Mail list logo