Re: RFC: Out of Process Python UDFs in Arrow Compute

2022-05-06 Thread Vibhatha Abeykoon
@Li Thank you for the proposal and this discussion is very interesting. And also @Yaron, thanks for the note on the distributed execution and hard-to-debug nature when threads and processes from two systems try to perform a task seamlessly. There are very important points that are raised in this

Re: [RESULT][VOTE] Release Apache Arrow 8.0.0 - RC3

2022-05-06 Thread Neal Richardson
I will handle the R submission to CRAN. Neal On Fri, May 6, 2022 at 6:14 PM Sutou Kouhei wrote: > > 9. [todo:kou?] upload RubyGems > > I'll do it once Homebrew and MSYS2 packages are updated. > > In > "Re: [RESULT][VOTE] Release Apache Arrow 8.0.0 - RC3" on Fri, 6 May 2022 > 23:37:58 +0200,

Re: RFC: Out of Process Python UDFs in Arrow Compute

2022-05-06 Thread Weston Pace
@Yaron > 1. How to allocate compute threads between Arrow and locally executing UDFs, > avoiding unnecessarily competition over local compute resources? In general, there is no reason for UDFs to allocate threads. The project & filter nodes are trivially parallelizable. As long as there is su

Re: [RESULT][VOTE] Release Apache Arrow 8.0.0 - RC3

2022-05-06 Thread Sutou Kouhei
> 9. [todo:kou?] upload RubyGems I'll do it once Homebrew and MSYS2 packages are updated. In "Re: [RESULT][VOTE] Release Apache Arrow 8.0.0 - RC3" on Fri, 6 May 2022 23:37:58 +0200, Krisztián Szűcs wrote: > Current status of the post-release tasks: > > 1. [done] make the released version

Re: [RESULT][VOTE] Release Apache Arrow 8.0.0 - RC3

2022-05-06 Thread Ian Cook
I will update the vcpkg port Ian On Fri, May 6, 2022 at 17:38 Krisztián Szűcs wrote: > Current status of the post-release tasks: > > 1. [done] make the released version as "RELEASED" on JIRA > 2. [done] start the new version on JIRA > 3. [done] merge changes on release branch to maintenance bra

Re: [RESULT][VOTE] Release Apache Arrow 8.0.0 - RC3

2022-05-06 Thread Krisztián Szűcs
Current status of the post-release tasks: 1. [done] make the released version as "RELEASED" on JIRA 2. [done] start the new version on JIRA 3. [done] merge changes on release branch to maintenance branch for patch releases 4. [done] upload source 5. [done] upload binaries 6. [done] update website

[RESULT][VOTE] Release Apache Arrow 8.0.0 - RC3

2022-05-06 Thread Krisztián Szűcs
Hi, The vote carries with 4 +1 binding votes, 4 +1 non-binding votes and no -1 votes. I'm starting to work on the post-release tasks and keep this thread updated about the current status. Thanks everyone! - Krisztian On Fri, May 6, 2022 at 8:33 PM Krisztián Szűcs wrote: > > +1 (binding) > > Ve

Re: RFC: Out of Process Python UDFs in Arrow Compute

2022-05-06 Thread Li Jin
To add: I am open to other potential solutions to solve the three steps above, but just want to get on the same page of what are the minimal issues we need to solve in order to "define UDF with ibis and execute with Arrow" and avoid getting into too detailed discussions. On Fri, May 6, 2022 at 2:5

Re: RFC: Out of Process Python UDFs in Arrow Compute

2022-05-06 Thread Li Jin
Weston - Thanks for the thoughtful reply. It was quite useful. RE: " At the moment, I'm not aware of much is needed in the execution engine beyond what we have. If you (or anyone) has things that we can do in the execution engine then it would be good to identify them early. This is the most impo

Re: [VOTE] Release Apache Arrow 8.0.0 - RC3

2022-05-06 Thread Krisztián Szűcs
+1 (binding) Verified on macOS 12 arm64. The crossbow verification tasks were also successful [1]. [1]: https://github.com/apache/arrow/pull/13057 On Thu, May 5, 2022 at 4:02 PM Dewey Dunnington wrote: > > +1 (non-binding) > > I ran: > TEST_DEFAULT=0 TEST_CPP=1 dev/release/verify-release-candid

Re: [Array][C++]Whether batch with constant-type array will be supported in Arrow?

2022-05-06 Thread Weston Pace
Hi Song, Wes proposed a couple of different array types a few months ago in [1]. These were documented in [2]. In this proposal a constant array type was suggested in addition to a run-length encoded array type. During the discussion it was suggested that a constant array might just be a special

Re: RFC: Out of Process Python UDFs in Arrow Compute

2022-05-06 Thread Yaron Gvili
The general design seems reasonable to me. However, I think the multithreading issue warrants a (perhaps separate) discussion, in view of the risk that Arrow's multithreading model would end up being hard to interoperate with that of other libraries used to implement UDFs. Such interoperability

Re: [DISCUSS][C++][Python]Switch default mmap behaviour to off

2022-05-06 Thread Sasha Krassovsky
Hi, Which use of mmap are you referring to in the code base? Mmap in general could have a lot of different uses. The point of the paper you linked is that database management systems should explicitly manage their paging to and from disk to maintain transactional consistency or to avoid performa