Re: [RESULT][VOTE] Release Apache Arrow 17.0.0 - RC2

2024-07-17 Thread Raúl Cumplido
Hi, This is the status of the post-release tasks. Please see below the ones I need help with: [Done] Update the released milestone Date and set to “Closed” on GitHub [Done] Merge changes on release branch to maintenance branch for patch releases [Done] Add the new release to the Apache Reporter S

Re: [DISCUSS] Donation of a User-Defined Function Framework for Apache Arrow

2024-07-17 Thread Andrew Lamb
An update here is that one of the DataFusion contributors, @xinlifoobar, did a very neat prototype of using arrow-udf in DataFusion[1] and wrote up their findings[2] The major findings are that it would be possible, though it would take some additional work (e.g. single values, making the function

Re: Understanding possible synergies between arrow & zarr communities?

2024-07-17 Thread Andrew Lamb
> Has there been any discussion about rewriting parts of zarr in Rust (for example, the > IO management stack would be a prime candidate for this type of > treatment)? One project that might be interesting from the DataFusion community is [1] which is a native Rust implementation of reading/writin

Re: [RESULT][VOTE] Release Apache Arrow 17.0.0 - RC2

2024-07-17 Thread Raúl Cumplido
The missing wheels and source distribution for pyarrow 17.0.0 have been uploaded to PyPI now. Kind regards, Raúl El mié, 17 jul 2024 a las 11:32, Raúl Cumplido () escribió: > > Hi, > > This is the status of the post-release tasks. Please see below the > ones I need help with: > > [Done] Update th

Arrow community meeting July 17 at 16:00 UTC

2024-07-17 Thread Ian Cook
Our next biweekly Arrow community meeting is today at 16:00 UTC / 12:00 EDT. Zoom meeting URL: https://zoom.us/j/87649033008?pwd=SitsRHluQStlREM0TjJVYkRibVZsUT09 Meeting ID: 876 4903 3008 Passcode: 958092 Meeting notes will be captured in this Google Doc: https://docs.google.com/document/d/1xrji8

Re: [DISCUSS] 8-bit Boolean Canonical Extension Type

2024-07-17 Thread Dewey Dunnington
Thank you for this! I have definitely run across the one-byte-per-item bool in numpy, DuckDB, and cudf. I haven't heard any discussion about DuckDB here but I am fairly sure that they represent their boolean type as an int8 as well [1]. > Before the vote, I would like to see verification that this

Re: [DISCUSS] 8-bit Boolean Canonical Extension Type

2024-07-17 Thread Ian Cook
>> Before the vote, I would like to see verification that this truly enables >> zero-copy to/from NumPy bool arrays in Python. > I think this is an implementation issue more than a specification issue...I am not personally worried about any provisions on the specification that might make this impo

Re: [DISCUSS] Deprecate UCX transport for Arrow Flight in favor of Dissociated IPC Protocol

2024-07-17 Thread Raúl Cumplido
Hi, I've followed up with a PR to remove UCX transport for flight [1]. Thanks, Raúl [1] https://github.com/apache/arrow/pull/43297 El mié, 19 jun 2024 a las 11:29, Raúl Cumplido () escribió: > > Hi, > > I would like to discuss deprecation of the UCX transport for Arrow > Flight (ARROW_WITH_UCX)

Re: [DISCUSS] Deprecate UCX transport for Arrow Flight in favor of Dissociated IPC Protocol

2024-07-17 Thread Adam Lippai
Hi Raul, Finishing an experiment is good, it can help exploring more in the future (if the community doesn’t see it as a baggage to carry forever). Do you have any conclusions, a summary what was learned? I might be wrong, but my understanding was that the initial goal was replacing the TCP+TLS+

[VOTE][RUST] Release Apache Arrow Rust Object Store 0.10.2 RC1

2024-07-17 Thread Andrew Lamb
Hi, I would like to propose a release of Apache Arrow Rust Object Store Implementation, version 0.10.2. This release candidate is based on commit: b44497e1cdd84933b49b56dd00506411c040b46c [1] The proposed release tarball and signatures are hosted at [2]. The changelog is located at [3]. Please

Re: [VOTE][RUST] Release Apache Arrow Rust Object Store 0.10.2 RC1

2024-07-17 Thread Raphael Taylor-Davies
+1 (binding) Verified on x86_64 GNU/Linux Kind Regards, Raphael On 17/07/2024 18:36, Andrew Lamb wrote: Hi, I would like to propose a release of Apache Arrow Rust Object Store Implementation, version 0.10.2. This release candidate is based on commit: b44497e1cdd84933b49b56dd00506411c040b46c

Re: [VOTE][RUST] Release Apache Arrow Rust Object Store 0.10.2 RC1

2024-07-17 Thread L. C. Hsieh
+1 (binding) Verified on M1 Mac. Thanks Andrew. On Wed, Jul 17, 2024 at 10:37 AM Andrew Lamb wrote: > > Hi, > > I would like to propose a release of Apache Arrow Rust Object > Store Implementation, version 0.10.2. > > This release candidate is based on commit: > b44497e1cdd84933b49b56dd0050641

Re: [DISCUSS] 8-bit Boolean Canonical Extension Type

2024-07-17 Thread Joel Lubinitsky
Thank you for your comments. I spent some time trying to confirm definitively that this proposal would enable zero copy sharing both ways between pyarrow and numpy. I put together the following gist [1] with my experiment. To summarize the results: - I was able to share the underlying value buffe

Re: [DISCUSS] 8-bit Boolean Canonical Extension Type

2024-07-17 Thread Matt Topol
Just chiming in that the libcudf documentation[1] states that this proposal should work just fine. Bool8 type is described as "0 == false, else true". --Matt [1]: https://docs.rapids.ai/api/libcudf/stable/group__utility__types#gadf077607da617d1dadcc5417e2783539 On Wed, Jul 17, 2024, 3:18 PM Joel

Re: [DISCUSS] 8-bit Boolean Canonical Extension Type

2024-07-17 Thread Ian Cook
Thanks Joel and Matt. This looks good to me. I think it's worth saying here that Arrow-producing components should still by default emit Booleans in the standard bit-packed Arrow layout. This proposed bool8 canonical extension type is intended to be used in applications where the producer knows th

Re: [DISCUSS] Deprecate UCX transport for Arrow Flight in favor of Dissociated IPC Protocol

2024-07-17 Thread David Li
Replacing gRPC was not the intent. The disassociated protocol is worded very generically, but works over UCX and libfabric, so it is essentially equivalent but does not force you to use the predefined Flight RPC method names so it is more flexible in that recard. On Thu, Jul 18, 2024, at 02:20,

Re: [DISCUSS] 8-bit Boolean Canonical Extension Type

2024-07-17 Thread Alenka Frim
Thank you Joel for working on this! I have also came across the need for a byte packed boolean support when implementing the Python dataframe interchange protocol and also DPack which is implemented in Arrow C++. The extension type is a great solution. I will comment on the PR if I have any questi