Re: [Rust] [Discuss] proposal to redesign Arrow crate to resolve safety violations

2021-05-27 Thread Josh Taylor
I played around with it, for my use case I really like the new way of writing CSVs, it's much more obvious. I love the `read_stream_metadata` function as well. I'm seeing a very slight speed (~8ms) improvement on my end, but I read a bunch of files in a directory and spit out a CSV, the bottleneck

Re: [Rust] [Discuss] proposal to redesign Arrow crate to resolve safety violations

2021-05-27 Thread Josh Taylor
Hi! I've been using arrow/arrow-rs for a while now, my use case is to parse Arrow streaming files and convert them into CSV. Rust has been an absolute fantastic tool for this, the performance is outstanding and I have had no issues using it for my use case. I would be happy to test out the branc

[Rust] Proposed content for 4.2.0 arrow-rs release

2021-05-27 Thread Andrew Lamb
Hello, I am gearing up for our second bi-weekly release of arrow-rs. Here is a PR with a list of the changes that are on active_release and I propose to include in 4.2.0: https://github.com/apache/arrow-rs/pull/375 Comments / feedback welcomed. Andrew

Re: C++ Migrate from Arrow 0.16.0

2021-05-27 Thread Benjamin Kietzman
Yes this is an adaptation of ARROW_ASSIGN_OR_RAISE for their bridge, which seems to throw exceptions instead of returning Status/Result On Thu, May 27, 2021 at 4:42 PM Micah Kornfield wrote: > For the macro, I believe ARROW_ASSIGN_OR_RAISE already does this? > > On Thu, May 27, 2021 at 1:19 PM B

Re: C++ Migrate from Arrow 0.16.0

2021-05-27 Thread Micah Kornfield
For the macro, I believe ARROW_ASSIGN_OR_RAISE already does this? On Thu, May 27, 2021 at 1:19 PM Benjamin Kietzman wrote: > unique_ptr is used to designate unique ownership of the buffer > just created. It's fairly compatible with shared_ptr since > unique_ptr can convert implicitly to shared_p

Re: C++ Migrate from Arrow 0.16.0

2021-05-27 Thread Benjamin Kietzman
unique_ptr is used to designate unique ownership of the buffer just created. It's fairly compatible with shared_ptr since unique_ptr can convert implicitly to shared_ptr. One other refactoring in play here: we've been moving from Status-returning-out-argument functions to the more ergonomic Result

C++ Migrate from Arrow 0.16.0

2021-05-27 Thread Rares Vernica
Hello, We are trying to migrate from Arrow 0.16.0 to a newer version, hopefully up to 4.0.0. The Arrow 0.17.0 change in AllocateBuffer from taking a shared_ptr to returning a unique_ptr is making things very difficult. We wonder if there is a strong reason behind the change from shared_ptr to uniq

Re: [Rust] [Discuss] proposal to redesign Arrow crate to resolve safety violations

2021-05-27 Thread Jed Brown
Andy Grove writes: > > Looking at this purely from the DataFusion/Ballista point of view, what I > would be interested in would be having a branch of DF that uses arrow2 and > once that branch has all tests passing and can run queries with performance > that is at least as good as the original arr

Re: [Rust] [Discuss] proposal to redesign Arrow crate to resolve safety violations

2021-05-27 Thread Andy Grove
I don't have a very strong opinion on new repo vs branch but having a new repo seems simpler and less overhead to me. I think it makes this effort far more visible to the community and is more likely to get more people involved. Looking at this purely from the DataFusion/Ballista point of view, wh

Re: [Rust] [Discuss] proposal to redesign Arrow crate to resolve safety violations

2021-05-27 Thread Wes McKinney
I think given the size and scope of the work, there's a stronger argument for having an IP clearance for this code (as compared with python-datafusion). On Thu, May 27, 2021 at 5:45 AM Andrew Lamb wrote: > > I am not opposed to a new repo. > > However I believe that the largest barrier to the com

Re: [Rust] [Discuss] proposal to redesign Arrow crate to resolve safety violations

2021-05-27 Thread Andrew Lamb
I am not opposed to a new repo. However I believe that the largest barrier to the community really getting their heads around / evaluating arrow2 is its sheer size. -92k +57k isn't something I am likely to get my head in any level of detail until I actively work with it for a while. The best way

Re: [C++][Discuss] Switch to C++14

2021-05-27 Thread Antoine Pitrou
C++17 support on gcc requires gcc 7 (roughly): https://gcc.gnu.org/projects/cxx-status.html Ubuntu 18.04 has gcc 7.4, but Ubuntu 16.04 only has gcc 5.4 AFAIK. Apparently we stopped releasing binaries for Ubuntu 16.04: https://apache.jfrog.io/artifactory/arrow/ubuntu/pool/xenial/main/a/apache-

Re: Release archive URL

2021-05-27 Thread Andrew Lamb
Using a different prefix sounds like a good idea to me. I don't have any preference between apache-arrow{-rs} and arrow-{rs} I am happy to make that change for future Rust releases if others agree. Do you think we should rename the existing arrow-4.1.0 directory or just use the new name going f

Re: [C++][Discuss] Switch to C++14

2021-05-27 Thread Benjamin Kietzman
I'm definitely in favor of going to c++14. While we're at it: which platforms prevent us from using c++17? On Thu, May 27, 2021, 04:03 Antoine Pitrou wrote: > > Hello, > > It seems the only two platforms that constrained us to C++11 will not be > supported anymore (those platforms are RTools 3.

[C++][Discuss] Switch to C++14

2021-05-27 Thread Antoine Pitrou
Hello, It seems the only two platforms that constrained us to C++11 will not be supported anymore (those platforms are RTools 3.5 for R packages, and manylinux1 for Python packages). It would be beneficial to bump our C++ requirement to C++14. There is an issue open listing benefits: htt