[DISCUSS][C++][Python]Switch default mmap behaviour to off

2022-05-05 Thread Alvin Chunga Mamani
Hi all, I start this discussion to comment on the change to disable the use of mmap by default, which represents a risk in non-local/pseudo file systems that can affect performance. Part of the solution would be to have a flag at the compilation level that allows you to activate or deactivate the u

[Array][C++]Whether batch with constant-type array will be supported in Arrow?

2022-05-05 Thread Dongxiao Song
Hello, I’m using arrow c++ as storage and computing structure of my own project, which is a database based on PostgresSQL. But when computing with a batch containing constant value column, the constant value has to be expanded to an array to store into batch, which is waste of time and memory.

Re: RFC: Out of Process Python UDFs in Arrow Compute

2022-05-05 Thread Weston Pace
Yes, I think you have the right understanding. That is what I was originally trying to say by pointing out that our current solution does not solve the serialization problem. I think there is probably room for multiple approaches to this problem. Each will have their own tradeoffs. For example,

Re: [DISC] (Python) Dropping support for manylinux2010

2022-05-05 Thread Sutou Kouhei
+1 Our next major release will be in July or August. I think that pypa will drop support for manylinux2010 officially when release a next major version. Thanks, -- kou In "[DISC] (Python) Dropping support for manylinux2010" on Thu, 5 May 2022 13:01:47 +0200, Jacob Wujciak wrote: > Hi al

Re: RFC: Out of Process Python UDFs in Arrow Compute

2022-05-05 Thread Li Jin
After reading the above PR, I think I understand the approach a bit more now. If I understand this correctly, the above UDF functionality is similar to what I have in mind. The main difference seems to be "where and how are the UDF executed" (1) In the PR above, the UDF is passed to the Compute e

[BLOG] Arrow 8.0.0 Blog Post details

2022-05-05 Thread Raul Cumplido
Hi, I have created an initial skeleton for the blog post for the Arrow 8.0.0 release [1]. We can start adding the details around what information we want that blog post to contain so we can publish it closer to the time when we publish the release. Please feel free to fill in the details for it.

Re: [VOTE] Release Apache Arrow 8.0.0 - RC3

2022-05-05 Thread Dewey Dunnington
+1 (non-binding) I ran: TEST_DEFAULT=0 TEST_CPP=1 dev/release/verify-release-candidate.sh 8.0.0 3 I also ran R CMD check locally on that commit, and only got the usual NOTE about a large libs directory. I ran into an OSError (too many open files) when trying with TEST_PYTHON=1, but I assume this

Re: RFC: Out of Process Python UDFs in Arrow Compute

2022-05-05 Thread Li Jin
"def pandas_rank(ctx, arr): series = arr.to_pandas() rank = series.rank() return pa.array(rank) " Oh nice! I didn't get that from the original PR and does look this is closer to the problem I am trying to solve. At this point I will understand more about that PR and see if what I propo

Re: [DISC] (Python) Dropping support for manylinux2010

2022-05-05 Thread Antoine Pitrou
That sounds ok to me. Le 05/05/2022 à 13:01, Jacob Wujciak a écrit : Hi all, I would like to propose that we drop support for manylinux2010. CentoOS 6, on which the manylinux2010 image is based, has been EOL for over two years [1]. There is now also an official announcement by pypa that man

Re: [DISC] (Python) Dropping support for manylinux2010

2022-05-05 Thread Alessandro Molina
non binding +1 On Thu, May 5, 2022 at 1:02 PM Jacob Wujciak wrote: > Hi all, > > I would like to propose that we drop support for manylinux2010. > > CentoOS 6, on which the manylinux2010 image is based, has been EOL for over > two years [1]. > There is now also an official announcement by pypa t

[DISC] (Python) Dropping support for manylinux2010

2022-05-05 Thread Jacob Wujciak
Hi all, I would like to propose that we drop support for manylinux2010. CentoOS 6, on which the manylinux2010 image is based, has been EOL for over two years [1]. There is now also an official announcement by pypa that manylinux2010 support will be dropped sometime in 2022 [2] that has not receiv

Re: [VOTE] Release Apache Arrow 8.0.0 - RC3

2022-05-05 Thread Raul Cumplido
+1 (non-binding) I ran: TEST_DEFAULT=0 TEST_CPP=1 TEST_GLIB=1 TEST_PYTHON=1 TEST_GO=1 TEST_JAVA=1 TEST_JS=1 TEST_RUBY=1 TEST_CSHARP=1 dev/release/verify-release-candidate.sh 8.0.0 3 on arch linux (5.17.5-arch1-1), x86_64 with: gcc version 11.2.0 (GCC) openjdk version "11.0.15" 2022-04-19 python 3