Re: [VOTE][RUST][DataFusion] Release Apache Arrow DataFusion 8.0.0 RC1

2022-05-12 Thread Andy Grove
Please disregard this. The verification script has not been updated to reflect changes in the project structure so I will create RC2 shortly. On Thu, May 12, 2022 at 6:38 PM Andy Grove wrote: > Hi, > > I would like to propose a release of Apache Arrow DataFusion > Implementation, > version 8.0.0

[VOTE][RUST][DataFusion] Release Apache Arrow DataFusion 8.0.0 RC1

2022-05-12 Thread Andy Grove
Hi, I would like to propose a release of Apache Arrow DataFusion Implementation, version 8.0.0. This release candidate is based on commit: 5d52b32a7d8a2a58c7de1a35a20e1c3e08b55ca3 [1] The proposed release tarball and signatures are hosted at [2]. The changelog is located at [3]. Please download,

Re: Arrow sync call May 11 at 12:00 US/Eastern, 16:00 UTC

2022-05-12 Thread Andrew Lamb
Also, it seems as if duckdb[1] is heading in the same direction of adding a dataframe API to their database engine [1] https://github.com/duckdb/duckdb/issues/2000 On Thu, May 12, 2022 at 3:36 PM Andrew Lamb wrote: > For what it is worth, DataFusion has a DataFrame interface[1], that uses > the

Re: Arrow C-Data and DuckDB

2022-05-12 Thread David Li
Thanks all for the comments. I see Tom also put up a PR to add this to DuckDB [1]. Do we need a vote for this? If so unless there are further comments I think we can start one. [1]: https://github.com/duckdb/duckdb/pull/3628 On Tue, May 10, 2022, at 13:31, David Li wrote: > For discussion I've

Re: [C++] Control flow and scheduling in C++ Engine operators / exec nodes

2022-05-12 Thread Andrew Lamb
Thank you for sharing this document. Raphael Taylor-Davies is working on a similar exercise scheduling execution for DataFusion plans. The design doc[1] and initial PR [2] may be an interesting reference. In the DataFusion case we were trying to improve performance in a few ways: 1. Within a pip

Re: Arrow sync call May 11 at 12:00 US/Eastern, 16:00 UTC

2022-05-12 Thread Andrew Lamb
For what it is worth, DataFusion has a DataFrame interface[1], that uses the same underlying `LogicalPlan` structures as the SQL interface. Unsurprisingly it is heavily inspired by pandas. I believe that this interface seems more familiar and popular for DataFusion users who programmatically build

Re: [C++] Control flow and scheduling in C++ Engine operators / exec nodes

2022-05-12 Thread Li Jin
Thanks Wes and Michal. We have similar concern about the current eager-push control flow with time series / ordered data processing and am glad that we are not the only one thinking about this. I have read the doc and so far just left some questions to make sure I understand the proposal (admitte

Re: Arrow sync call May 11 at 12:00 US/Eastern, 16:00 UTC

2022-05-12 Thread Wes McKinney
> Discussion about whether the community around Arrow would like to have > DataFrame-like APIs for Arrow in more languages, for example C++ We've discussed this a bit on the mailing list in the past, see https://docs.google.com/document/d/1XHe_j87n2VHGzEbnLe786GHbbcbrzbjgG8D0IXWAeHg/edit#heading

Re: [RESULT][VOTE] Release Apache Arrow 8.0.0 - RC3

2022-05-12 Thread Neal Richardson
> > 15. [todo:nealrichardson] update R packages > Done. > > 16. [todo:ianmcook] update vcpkg port > > 17. [done] bump versions > > 18. [done] update tags for Go modules > > 19. [done] update docs > > 20. [done] announce release > > 21. [done] remove old release candidates > > > > On Sat, May 7,

Re: Question: What should the offsets buffer be for an empty (list, binary, string) array?

2022-05-12 Thread Andrew Lamb
Another key piece of information[1] provided by Jorge on the the ticket is that there is an (older) IPC test case file that has empty offset buffers for an array of zero length, which is why this issue came up. I think we have closed this issue now to our satisfaction. Thank you all for the commen