[ANNOUNCE] Apache Arrow 5.0.0 released

2021-07-28 Thread Krisztián Szűcs
The Apache Arrow community is pleased to announce the 5.0.0 release. The release includes 555 resolved issues ([1]) since the 4.0.0 release. The release is available now from our website, [2] and [3]: https://arrow.apache.org/install/ Release notes are available at: https://arrow.apache.o

Re: [VOTE][RESULT] Release Apache Arrow 5.0.0 - RC1

2021-07-28 Thread Krisztián Szűcs
Current status of the post-release tasks: 1. [x] bump versions on main branch [x] push development tag 2. [x] upload source 3. [x] upload binaries 4. [x] update website 5. [x] upload ruby gems 6. [x] upload js packages 8. [x] upload C# packages 10. [in-progress] update conda recipes

Re: [DISCUSS][C++] Enabling finer-grained parallelism in compute operators, quantifying ExecBatch overhead

2021-07-28 Thread Eduardo Ponce
My mistake, I confused the input type to kernels as Datums, when they are in fact Scalar and ArrayData. I agree that SIMD details should not be exposed in the kernel API. ~Eduardo On Wed, Jul 28, 2021 at 6:38 PM Wes McKinney wrote: > On Wed, Jul 28, 2021 at 5:23 PM Eduardo Ponce wrote: > > > >

Re: [DISCUSS][C++] Enabling finer-grained parallelism in compute operators, quantifying ExecBatch overhead

2021-07-28 Thread Wes McKinney
On Wed, Jul 28, 2021 at 5:23 PM Eduardo Ponce wrote: > > Hi all, > > I agree with supporting finer-grained parallelism in the compute operators. > I think that incorporating a Datum-like span, would allow expressing > parallelism not only > on a per-thread basis but can also be used to represent S

Re: [DISCUSS][C++] Enabling finer-grained parallelism in compute operators, quantifying ExecBatch overhead

2021-07-28 Thread Eduardo Ponce
Hi all, I agree with supporting finer-grained parallelism in the compute operators. I think that incorporating a Datum-like span, would allow expressing parallelism not only on a per-thread basis but can also be used to represent SIMD spans, where span length is directed by vector ISA, "L2" cache

Re: [DISCUSS][C++] Enabling finer-grained parallelism in compute operators, quantifying ExecBatch overhead

2021-07-28 Thread Wes McKinney
On Wed, Jul 28, 2021 at 5:39 AM Antoine Pitrou wrote: > > > Le 28/07/2021 à 03:33, Wes McKinney a écrit : > > > > I don't have the solution worked out for this, but the basic gist is: > > > > * To be 10-100x more efficient ExecBatch slicing cannot call > > ArrayData::Slice for every field like it

Re: [VOTE][RESULT] Release Apache Arrow 5.0.0 - RC1

2021-07-28 Thread Sutou Kouhei
Current status of the post-release tasks: 1. [x] bump versions on main branch [x] push development tag 2. [x] upload source 3. [in-progress] upload binaries 4. [x] update website 5. [x] upload ruby gems 6. [x] upload js packages 8. [x] upload C# packages 10. [ ] update conda recipes 11.

[Rust][DataFusion][Discuss] GroupByHash

2021-07-28 Thread Andrew Lamb
We have been working on a proposal[1] to improve the grouping operation in DataFusion (driven by the need to correctly support grouping by nulls). I just wanted to point it out on this list in case anyone would like to comment Thank you, Andrew [1] https://github.com/apache/arrow-datafusion/issu

Re: [VOTE][RESULT] Release Apache Arrow 5.0.0 - RC1

2021-07-28 Thread Krisztián Szűcs
Current status of the post-release tasks: 1. [x] bump versions on main branch [x] push development tag 2. [x] upload source 3. [in-progress] upload binaries 4. [x] update website 5. [x] upload ruby gems 6. [x] upload js packages 8. [x] upload C# packages 10. [ ] update conda recipes 11.

Re: Apache Arrow Cookbook

2021-07-28 Thread Wes McKinney
hi Alessandro — I just merged the PR, thank you! I would still like us to move to use ipython_directive in the authoring of Python examples so that authors do not have to copy-paste console output into the recipes, but that doesn't have to be addressed right now. Thanks, Wes On Wed, Jul 28, 2021

Re: Compute Functions: Modulo

2021-07-28 Thread Ian Cook
Hi Rares, We have an open Jira issue for this at https://issues.apache.org/jira/browse/ARROW-12755, and some other related issues linked from it. Please comment there if you have suggestions for the implementation. Thank you, Ian On Wed, Jul 28, 2021 at 7:15 AM Antoine Pitrou wrote: > > > Hell

Re: Apache Arrow Cookbook

2021-07-28 Thread Alessandro Molina
Hi everybody, The Cookbook PR has been open for more than a week at this point and we have received tons of great feedback and suggestions, many of which we incorporated already. For the benefit of being able to verify the publishing workflow and the CI I'd love to ask if there is anyone who could

[VOTE][RESULT] Release Apache Arrow 5.0.0 - RC1

2021-07-28 Thread Krisztián Szűcs
The VOTE carries with 4 binding +1 and 2 non-binding +1 votes. Thanks everyone! I'm starting the post release tasks and will keep you posted about the current status. On Wed, Jul 28, 2021 at 2:28 PM Krisztián Szűcs wrote: > > +1 (binding) > > Verified on Intel macOS Big Sur. > The verification

Re: [VOTE] Release Apache Arrow 5.0.0 - RC1

2021-07-28 Thread Krisztián Szűcs
+1 (binding) Verified on Intel macOS Big Sur. The verification tasks [1] have passed except the integration tests due to the go issue. [1]: https://github.com/apache/arrow/pull/10816 On Tue, Jul 27, 2021 at 6:53 PM Krisztián Szűcs wrote: > > During the verification of M1 wheels we discovered an

Re: Compute Functions: Modulo

2021-07-28 Thread Antoine Pitrou
Hello Rares, I agree with defining a new modulo compute function (or "remainder"?). However, there also needs to be a checked version that returns an error for invalid input (e.g. division by zero). Regards Antoine. Le 28/07/2021 à 13:04, Rares Vernica a écrit : Hello, I'm making use of

Re: Compute Functions: Modulo

2021-07-28 Thread Rares Vernica
PS a is an integer array and b is an integer scalar. On Wed, Jul 28, 2021 at 1:04 PM Rares Vernica wrote: > Hello, > > I'm making use of the Compute Functions to do some basic arithmetic. One > operation I need to perform is the modulo, i.e., a % b. I'm debating > between two options: > > 1. Com

Compute Functions: Modulo

2021-07-28 Thread Rares Vernica
Hello, I'm making use of the Compute Functions to do some basic arithmetic. One operation I need to perform is the modulo, i.e., a % b. I'm debating between two options: 1. Compute it using the available Compute Functions using a % b = a - a / b * b, where / is the integer division. I assume that

Re: [DISCUSS][C++] Enabling finer-grained parallelism in compute operators, quantifying ExecBatch overhead

2021-07-28 Thread Antoine Pitrou
Le 28/07/2021 à 03:33, Wes McKinney a écrit : I don't have the solution worked out for this, but the basic gist is: * To be 10-100x more efficient ExecBatch slicing cannot call ArrayData::Slice for every field like it does now * Atomics associated with interacting with shared_ptr / shared_ptr