Re: [RUST][Go][proposal] Arrow Intermediate Representation to facilitate the transformation of row-oriented data sources into Arrow columnar representation

2022-07-28 Thread Laurent Quérel
Hi Julian, My intermediate representation is indeed an API and does not define a specific physical format (which could be different from one language to another, or even not exist at all in some cases). That being said, I didn't understand your feedback and I'm sure there's something to dig into h

Re: [RUST][Go][proposal] Arrow Intermediate Representation to facilitate the transformation of row-oriented data sources into Arrow columnar representation

2022-07-28 Thread Laurent Quérel
Hi Gavin, I was not aware of this initiative but indeed, these two proposals have much in common. The implementation I am working on is available here https://github.com/lquerel/otel-arrow-adapter (directory pkg/air). I would be happy to get your feedback and identify with you the possible gaps to

Re: [RUST][Go][proposal] Arrow Intermediate Representation to facilitate the transformation of row-oriented data sources into Arrow columnar representation

2022-07-28 Thread Laurent Quérel
Hi Sasha, Thank you very much for this informative comment. It's interesting to see another use of a row-based API in the context of a query engine. I think that there is some thought to be given to whether or not it is possible to converge these two use cases into a single public row-based API. A

Re: [RUST][Go][proposal] Arrow Intermediate Representation to facilitate the transformation of row-oriented data sources into Arrow columnar representation

2022-07-28 Thread Julian Hyde
If the 'row-oriented format' is an API rather than a physical data representation then it can be implemented via coroutines and could therefore have less scattered patterns of read/write access. By 'coroutines' I'm being rather imprecise, but I hope you get the general idea. An asynchronous API (w

Re: [RUST][Go][proposal] Arrow Intermediate Representation to facilitate the transformation of row-oriented data sources into Arrow columnar representation

2022-07-28 Thread Gavin Ray
This is essentially the same idea as the proposal here I think -- row/map-based representation & conversion functions for ease of use: [RFC] [Java] Higher-level "DataFrame"-like API. Lower barrier to entry, increase adoption/audience and productivity. · Issue #12618 · apache/arrow (github.com)

Re: [RUST][Go][proposal] Arrow Intermediate Representation to facilitate the transformation of row-oriented data sources into Arrow columnar representation

2022-07-28 Thread Sasha Krassovsky
Hi everyone, I just wanted to chime in that we already do have a form of row-oriented storage inside of `arrow/compute/row/row_internal.h`. It is used to store rows inside of GroupBy and Join within Acero. We also have utilities for converting to/from columnar storage (and AVX2 implementations

Re: [RUST][Go][proposal] Arrow Intermediate Representation to facilitate the transformation of row-oriented data sources into Arrow columnar representation

2022-07-28 Thread Laurent Quérel
Thank you Micah for a very clear summary of the intent behind this proposal. Indeed, I think that clarifying from the beginning that this approach aims at facilitating experimentation more than efficiency in terms of performance of the transformation phase would have helped to better understand my

[RUST][Go][proposal] Arrow Intermediate Representation to facilitate the transformation of row-oriented data sources into Arrow columnar representation

2022-07-28 Thread Micah Kornfield
Hi Laurent, I'm retitling this thread to include the specific languages you seem to be targeting in the subject line to hopefully get more eyes from maintainers in those languages. Thanks for clarifying the goals. If I can restate my understanding, the intended use-case here is to provide easy (f

Re: [proposal] Arrow Intermediate Representation to facilitate the transformation of row-oriented data sources into Arrow columnar representation

2022-07-28 Thread Laurent Quérel
Far be it from me to think that I know more than Jorge or Wes on this subject. Sorry if my post gives that perception, that is clearly not my intention. I'm just trying to defend the idea that when designing this kind of transformation, it might be interesting to have a library to test several mapp

Re: [proposal] Arrow Intermediate Representation to facilitate the transformation of row-oriented data sources into Arrow columnar representation

2022-07-28 Thread Benjamin Blodgett
He was trying to nicely say he knows way more than you, and your ideas will result in a low performance scheme no one will use in production ai/machine learning. Sent from my iPhone > On Jul 28, 2022, at 12:14 PM, Benjamin Blodgett > wrote: > > I think Jorge’s opinion has is that of an expe

Re: [proposal] Arrow Intermediate Representation to facilitate the transformation of row-oriented data sources into Arrow columnar representation

2022-07-28 Thread Benjamin Blodgett
I think Jorge’s opinion has is that of an expert and him being humble is just being tactful. Probably listen to Jorge on performance and architecture, even over Wes as he’s contributed more than anyone else and know the bleeding edge of low level performance stuff more than anyone. Sent from

Re: [proposal] Arrow Intermediate Representation to facilitate the transformation of row-oriented data sources into Arrow columnar representation

2022-07-28 Thread Laurent Quérel
Hi Jorge I don't think that the level of in-depth knowledge needed is the same between using a row-oriented internal representation and "Arrow" which not only changes the organization of the data but also introduces a set of additional mapping choices and concepts. For example, assuming that the

Re: [VOTE] Release Apache Arrow 9.0.0 - RC1

2022-07-28 Thread Krisztián Szűcs
The automated verification tasks pass [1], though I'd consider to include ARROW-17193: [C++] Add support for finding system Abseil [2] in the 9.0 release. [1]: https://github.com/apache/arrow/pull/13729 [2]: https://github.com/apache/arrow/pull/13731 On Thu, Jul 28, 2022 at 4:47 PM Krisztián Szűc

[VOTE] Release Apache Arrow 9.0.0 - RC1

2022-07-28 Thread Krisztián Szűcs
Hi, I would like to propose the following release candidate (RC1) of Apache Arrow version 9.0.0. This is a release consisting of 501 resolved JIRA issues[1]. This release candidate is based on commit: 6b59b2f498cd03e50c88d400a83cfc360fb3d1f1 [2] The source release rc1 is hosted at [3]. The binar

Re: [RESULT][VOTE] Release Apache Arrow 8.0.1 - RC0

2022-07-28 Thread Raul Cumplido Dominguez
Hi, On the 8.0.1 release topic there was an automated PR on conda-forge for the arrow-cpp-feedstocks around the 8.0.1 release [1]. Not entirely sure if we have to do something about it. Regards, Raúl [1] https://github.com/conda-forge/arrow-cpp-feedstock/pull/805 On Thu, Jul 28, 2022 at 3:58 PM

Re: [RESULT][VOTE] Release Apache Arrow 8.0.1 - RC0

2022-07-28 Thread Matthew Topol
It would probably be a good idea to just make sure that any release notes mention that for a Go user to upgrade their dependency they need to run something like `go get -u github.com/apache/arrow/go/v6/@v6.0.2` replacing v6/v6.0.2 with their desired version combination. This will get them the p

Re: Arrow Flight usage with graph databases

2022-07-28 Thread Lee, David
I believe the graphql spec supports both pagination and cursors for interacting with web apps which could be used to construct record batches. > On Jul 27, 2022, at 5:45 PM, Matthew Topol > wrote: > > External Email: Use caution with links and attachments > > > Yea, the drawback you'll fin