Re: [ANNOUNCE] New Arrow PMC member: Alenka Frim

2025-07-01 Thread Wes McKinney
Congratulations! On Tue, Jul 1, 2025 at 1:22 PM Micah Kornfield wrote: > Congrats! > > On Tue, Jul 1, 2025 at 11:07 AM William Ayd .invalid> > wrote: > > > Congrats! > > > > Sent from my iPhone > > > > > On Jul 1, 2025, at 1:06 PM, Weston Pace wrote: > > > > > > Congratulations Alenka! > > >

Re: [Discuss][C++] Deprecate Feather V1 reader and writer

2025-06-04 Thread Wes McKinney
ster/python/setup.py#L71 > > > > On Tue, Jun 3, 2025 at 11:09 AM Jacob Wujciak > wrote: > > > >> +1 I like the idea of keeping the reader around for a bit longer! > >> > >> Wes McKinney schrieb am Di., 3. Juni 2025, 17:02: > >> > >>&

Re: [Discuss][C++] Deprecate Feather V1 reader and writer

2025-06-03 Thread Wes McKinney
That sounds fine to me. On Tue, Jun 3, 2025 at 8:09 AM Antoine Pitrou wrote: > > Hello > > Arrow C++ still supports the very old file format "Feather V1" which was > designed in 2016 and is superseded by the Arrow IPC file format. (*) > > (note: "Feather V2" is a synonym for Arrow IPC to encoura

Re: [VOTE][Swift] Split Swift to separated repository

2025-05-20 Thread Wes McKinney
+1 On Tue, May 20, 2025 at 1:49 AM Raúl Cumplido wrote: > +1 > > El mar, 20 may 2025 a las 4:14, Neal Richardson (< > neal.p.richard...@gmail.com>) escribió: > > > +1 > > > > On Mon, May 19, 2025 at 10:06 PM Ruoxi Sun > wrote: > > > > > +1 > > > > > > *Regards,* > > > *Rossi SUN* > > > > > > >

Re: [Call For Volunteer] Apache Arrow Summit and Selection Committee

2025-05-19 Thread Wes McKinney
I just signed up to be at the conference to attend the summit, so I can also help with selection if not too late! On Mon, May 19, 2025 at 8:15 AM Li Jin wrote: > Sorry for missing this email. I volunteer as well. > > (I have been working with / building Arrow-based data processing > systems sinc

Re: [DISCUSS][C++] Switch to C++20

2025-05-19 Thread Wes McKinney
+1 On Mon, May 19, 2025 at 4:03 PM Sutou Kouhei wrote: > +1 > > In > "[DISCUSS][C++] Switch to C++20" on Mon, 19 May 2025 18:14:27 +0200, > Antoine Pitrou wrote: > > > > > Hello, > > > > I am proposing that we switch Arrow C++ to require C++20. > > > > C++20 will offer support for more C++

Re: [C++] Deprecate Skyhook?

2025-05-05 Thread Wes McKinney
I agree with this also -- it was perhaps aspirational to see if more contributors around the idea of Arrow-accelerated file systems would come out of the woodwork but at this point it would probably make sense for this to spin off into a separate project (either within the Apache umbrella if there

Re: [VOTE] Enable GitHub Discussions for apache/arrow-*

2025-03-21 Thread Wes McKinney
+1 On Fri, Mar 21, 2025 at 11:03 AM Bryce Mecum wrote: > +1 > > On Thu, Mar 20, 2025 at 9:50 PM Sutou Kouhei wrote: > > > > Hi, > > > > I would like to propose enabling GitHub Discussions on: > > > > * apache/arrow > > * apache/arrow-adbc > > * apache/arrow-cookbook > > * apache/arrow-experimen

Re: [ANNOUNCE] New Arrow PMC member: Ian Cook

2025-03-20 Thread Wes McKinney
Congrats! On Thu, Mar 20, 2025 at 7:37 AM Neal Richardson wrote: > Congratulations! > > On Thu, Mar 20, 2025 at 8:16 AM Rok Mihevc wrote: > > > Congrats Ian! Well deserved! > > > > On Thu, Mar 20, 2025 at 12:04 PM Alenka Frim > > wrote: > > > > > Congratulations Ian!! > > > > > > V čet., 20. m

Re: [ANNOUNCE] New Arrow PMC member: Rok Mihevc

2025-03-19 Thread Wes McKinney
Congratulations! On Wed, Mar 19, 2025 at 2:48 PM Raúl Cumplido wrote: > Congratulations Rik! > > El mié, 19 mar 2025, 20:26, Kevin Gurney > escribió: > > > Congratulations, Rok! > > > > From: Ed Seidl > > Sent: Wednesday, March 19, 2025 3:15 PM > > To: dev@arro

Re: [ANNOUNCE] New Arrow PMC member: Gang Wu

2024-12-03 Thread Wes McKinney
Congrats! On Tue, Dec 3, 2024 at 3:58 PM Micah Kornfield wrote: > Congrats Gang! > > On Tue, Dec 3, 2024 at 1:31 PM Fokko Driesprong wrote: > > > Awesome, congrats Gang! > > > > Op di 3 dec 2024 om 22:25 schreef Matt Topol : > > > > > Congrats Gang!! > > > > > > On Tue, Dec 3, 2024, 4:20 PM Sut

Re: [VOTE] Split Java release process

2024-11-21 Thread Wes McKinney
+1 (binding). Thank you for driving forward this work On Fri, Nov 22, 2024 at 11:31 AM Gang Wu wrote: > +1 (non-binding) > > On Fri, Nov 22, 2024 at 10:10 AM Jacob Wujciak > wrote: > > > +1 (non-binding) > > > > Am Fr., 22. Nov. 2024 um 03:06 Uhr schrieb David Li >: > > > > > > +1 (binding) >

Re: [ANNOUNCE] New Arrow PMC chair: Neil Richardson

2024-10-30 Thread Wes McKinney
Congrats Neal! On Wed, Oct 30, 2024 at 7:23 AM Edmondo Porcu wrote: > Congrats Neil and thank you Andy! > > On Wed, Oct 30, 2024 at 8:21 AM Rok Mihevc wrote: > > > Thanks Andy and Neil! > > > > Rok > > > > On Wed, Oct 30, 2024 at 1:18 PM Vibhatha Abeykoon > > wrote: > > > > > Congratulations N

Re: [DISCUSS] 8-bit Boolean Canonical Extension Type

2024-07-22 Thread Wes McKinney
>From a historical perspective, if we had had extension types / canonical extension types, it would have made more sense to have the millisecond dates as an extension type. The goal of having the extra type was to avoid an unnecessary serialization in systems where there is a benefit to moving dat

Re: Understanding possible synergies between arrow & zarr communities?

2024-07-10 Thread Wes McKinney
hi Carl, I agree that cross-collaboration and knowledge/tools sharing could be very helpful. Even though we've done a lot of engineering on low-level IO and memory management, there are probably still many aspects of the Parquet C++ reader (what powers pyarrow.parquet) that could be improved to do

Re: [VOTE] Migration of parquet-cpp issues to Arrow's issue tracker

2024-05-29 Thread Wes McKinney
+1 (binding for Arrow and Parquet) On Wed, May 29, 2024 at 12:13 PM Raúl Cumplido wrote: > +1 (binding for Arrow) > > El mié, 29 may 2024, 18:15, Andy Grove escribió: > > > +1 (binding for Arrow). > > > > Thanks, > > > > Andy. > > > > On Wed, May 29, 2024 at 9:48 AM Alenka Frim > .invalid> > >

Re: [VOTE][Format] UUID canonical extension type

2024-05-06 Thread Wes McKinney
+1 On Tue, Apr 30, 2024 at 4:03 PM Antoine Pitrou wrote: > +1 (binding) > > > Le 19/04/2024 à 22:22, Rok Mihevc a écrit : > > Hi all, > > > > Following initial requests [1][2] and recent tangential ML discussion > [3] I > > would like to propose a vote to add language for UUID canonical extensio

Re: [VOTE][Format] JSON canonical extension type

2024-05-06 Thread Wes McKinney
+1 On Tue, Apr 30, 2024 at 4:03 PM Antoine Pitrou wrote: > +1 (binding) for the current proposal, i.e. with the RFC 8289 > requirement and the 3 current String types allowed. > > Regards > > Antoine. > > > Le 30/04/2024 à 19:26, Rok Mihevc a écrit : > > Hi all, thanks for the votes and comments

Re: Fwd: PyArrow Using Parquet V2

2024-04-24 Thread Wes McKinney
I think there is confusion about the Parquet "V2" (including the V2 data pages, and other details) and the 2.x.y releases of the format library artifact. They aren't the same unfortunately. I don't think the V2 metadata structures (the data pages in particular, and new column encoding) is widely ad

Re: Unsupported/Other Type

2024-04-10 Thread Wes McKinney
In the past we have discussed adding a canonical type for UUID and JSON. I still think this is a good idea and could improve ergonomics in downstream language bindings (e.g. by exposing JSON querying function or automatically boxing UUIDs in built-in UUID types, like the Python uuid library). Has a

Re: [ANNOUNCE] New Committer Joel Lubinitsky

2024-04-01 Thread Wes McKinney
Congrats! On Mon, Apr 1, 2024 at 11:01 AM Andrew Lamb wrote: > Congratulations Joel. > > On Mon, Apr 1, 2024 at 11:53 AM Raúl Cumplido > wrote: > > > Congratulations and welcome Joel! > > > > > > El lun, 1 abr 2024, 17:18, Kevin Gurney > > escribió: > > > > > Congratulations, Joel! > > > > > >

Re: [ANNOUNCE] New Arrow committer: Bryce Mecum

2024-03-18 Thread Wes McKinney
Congrats! On Mon, Mar 18, 2024 at 12:15 PM James Duong wrote: > Congratulations Bryce! > > From: Dane Pitkin > Date: Monday, March 18, 2024 at 7:28 AM > To: dev@arrow.apache.org > Subject: Re: [ANNOUNCE] New Arrow committer: Bryce Mecum > Congratulations, Bryce!! > > On Mon, Mar 18, 2024 at 9:

Re: [VOTE] Move Arrow DataFusion Subproject to new Top Level Apache Project

2024-03-01 Thread Wes McKinney
D, that the office of "Vice President, Apache DataFusion" be > > >> > and hereby is created, the person holding such office to > > >> > serve at the direction of the Board of Directors as the chair > > >> > of the Apache DataFusion Project, and to have primary re

Re: [DISCUSS][RFC] Draft Proposal for new Top Level Project for DataFusion

2024-02-28 Thread Wes McKinney
I'd be happy to help. I think we will have to participate in PMC matters infrequently (should there be a difficult issue in the future, we could offer some perspective from cases in the past). On Wed, Feb 28, 2024 at 2:13 PM Andrew Lamb wrote: > Wes brought up a great point on the document[1] th

Re: [VOTE] Protocol for Dissociated Arrow IPC Transports

2024-02-27 Thread Wes McKinney
Have there been efforts to proactively reach out to other third parties that might have an interest in this or be a potential user at some point? There are a lot of interested parties in Arrow that may not actively follow the mailing list. Seems like folks from the Dask, Ray, RAPIDS (especially fo

Re: [DISCUSS] Donation of a Spark native engine based on DataFusion & Arrow

2024-02-11 Thread Wes McKinney
Congrats all! It's great to see the Arrow+DataFusion ecosystem expand in this way and to bring the work under the ASF umbrella. On Sun, Feb 11, 2024 at 5:02 AM Andrew Lamb wrote: > As a follow up here the acceptance vote [1] has passed, the IP Clearance > Process is complete [2] and the code PR

Re: [DISCUSS] Status and future of @ApacheArrow Twitter account

2024-01-29 Thread Wes McKinney
Is there a different tool other than TweetDeck available that can synchronize posts that go out on different social channels (LinkedIn, Twitter, Mastodon, etc.)? I've heard of things like Hootsuite but that's pretty expensive and definitely overkill for an open source project, but perhaps there is

Re: [VOTE] Accept donation of Comet Spark native engine

2024-01-27 Thread Wes McKinney
+1 (binding) On Sat, Jan 27, 2024 at 12:26 PM Micah Kornfield wrote: > +1 Binding > > On Sat, Jan 27, 2024 at 10:21 AM David Li wrote: > > > +1 (binding) > > > > On Sat, Jan 27, 2024, at 13:03, L. C. Hsieh wrote: > > > +1 (binding) > > > > > > On Sat, Jan 27, 2024 at 8:10 AM Andrew Lamb > > wr

Re: [DISCUSS] Conventions for transporting Arrow data over HTTP

2024-01-08 Thread Wes McKinney
hi all — I was just catching up on e-mail threads and wanted to give a few historical comments on this. When we were assembling the Arrow PMC and committing to do the project in 2015, standardizing Arrow-over-REST was always something that was on the TODO list — at that time we didn't have the IPC

Re: CIDR 2024

2023-12-05 Thread Wes McKinney
I will also be there. On Mon, Dec 4, 2023 at 12:58 PM Tony Wang wrote: > I am > > Get Outlook for Android > > From: Curt Hagenlocher > Sent: Monday, December 4, 2023 12:53:00 PM > To: dev@arrow.apache.org > Subject: CIDR 2024 > > Who's g

Re: [DISCUSS][C++] Raw pointer string views

2023-09-28 Thread Wes McKinney
hi all, I'm just catching up on this thread after having taken a look at the format PRs, the C++ implementation PR, and this e-mail thread. So only my $0.02 from having spent a great deal less time on this project than others. The original motivation I had for bringing up the idea of adding the S

Re: Apache Arrow filesystem question

2022-10-27 Thread Wes McKinney
I definitely think it would be a good thing to have a C++ ADLS filesystem interface that is on par in quality with our S3 and GCS C++ interfaces — these should also provide material performance benefits to Python users over a pure-Python interface (I'm not sure if pyarrow's S3 interface via C++ has

Re: [ANNOUNCE] New Arrow PMC member: Nicola Crane

2022-10-27 Thread Wes McKinney
Congratulations! On Wed, Oct 26, 2022 at 4:56 PM Ian Joiner wrote: > > Congrats Nic! > > Ian > > On Tuesday, October 25, 2022, Sutou Kouhei wrote: > > > The Project Management Committee (PMC) for Apache Arrow has invited > > Nicola Crane to become a PMC member and we are pleased to announce > >

Re: [DISCUSS] Python Wheel Size

2022-10-10 Thread Wes McKinney
We've discussed this in the past, I think. In addition to having many optional components enabled, the pyarrow wheel also includes the unit tests directory which is of growing size. I think if we made a pyarrow-slim wheel with support only for core Arrow (IPC, etc.) and Parquet file reading, it mig

Re: [Discuss] Deprecating Plasma

2022-09-26 Thread Wes McKinney
+1 On Thu, Sep 22, 2022 at 11:59 PM Sutou Kouhei wrote: > > +1 > > In > "[Discuss] Deprecating Plasma" on Thu, 22 Sep 2022 17:38:27 +0200, > Antoine Pitrou wrote: > > > > > Hello, > > > > The Plasma object store (*) hasn't received significant maintenance > > since at least 2020. The origin

Re: [ANNOUNCE] New Arrow PMC member: Raphael Taylor-Davies

2022-09-20 Thread Wes McKinney
Congratulations! On Tue, Sep 20, 2022 at 12:37 PM Ashish wrote: > > Congratulations !! > > On Tue, Sep 20, 2022 at 10:17 AM Ian Joiner wrote: > > > Congrats Raphael! > > > > On Mon, Sep 19, 2022 at 9:56 PM Sutou Kouhei wrote: > > > > > The Project Management Committee (PMC) for Apache Arrow has

Re: [ANNOUNCE] New Arrow committer: Remzi Yang

2022-09-10 Thread Wes McKinney
Congratulations! On Sat, Sep 10, 2022 at 7:12 AM Andrew Lamb wrote: > > On behalf of the Arrow PMC, I'm happy to announce that Remzi Yang > has accepted an invitation to become a committer on Apache > Arrow. Welcome, and thank you for your contributions! > > Andrew

Re: [VOTE] Substrait for Flight SQL

2022-09-09 Thread Wes McKinney
+1 (binding) On Thu, Sep 8, 2022 at 9:12 PM Jacques Nadeau wrote: > > My vote continues to be +1 > > On Thu, Sep 8, 2022 at 11:44 AM Neal Richardson > wrote: > > > +1 > > > > Neal > > > > On Thu, Sep 8, 2022 at 2:15 PM Ashish wrote: > > > > > +1 (non-binding) > > > > > > On Thu, Sep 8, 2022 at

Re: DISCUSS: [Format] Rules and procedures for Canonical extension types

2022-09-08 Thread Wes McKinney
+1 to this proposal. It would be great to use the JSON type as a crash dummy to work out the kinks in the process, but I think there are meaningful benefits (Parquet round-tripping) to getting this work under way. On Wed, Aug 24, 2022 at 11:22 AM Antoine Pitrou wrote: > > > Le 17/08/2022 à 18:45,

Re: Arrow Flight usage with graph databases

2022-09-08 Thread Wes McKinney
hi Bill — you can unsubscribe by e-mailing dev-unsubscr...@arrow.apache.org On Tue, Sep 6, 2022 at 2:40 PM Bill Zhao wrote: > > unsubscribe > > Valentyn Kahamlyk 于2022年7月18日周一 16:56写道: > > > > Hi All, > > > > I'm investigating the possibility of using Arrow Flight with graph > > databases, and

Re: [ANNOUNCE] New Arrow PMC member: Weston Pace

2022-09-08 Thread Wes McKinney
Congrats Weston!! On Tue, Sep 6, 2022 at 8:21 PM Krisztián Szűcs wrote: > > Congrats Weston! > > On Wed, Sep 7, 2022 at 1:41 AM Percy Camilo Triveño Aucahuasi > wrote: > > > > Great news! Congratulations Weston! > > > > On Tue, Sep 6, 2022 at 1:42 PM Andy Grove wrote: > > > > > Congrats Weston!

Re: [ANNOUNCE] New Arrow PMC member: L. C. Hsieh

2022-09-07 Thread Wes McKinney
Congrats! On Mon, Sep 5, 2022 at 2:05 PM Raul Cumplido Dominguez wrote: > > Congratulations! > > El lun, 5 sept 2022, 20:05, Ian Joiner escribió: > > > Congrats L.C.! > > > > On Sat, Sep 3, 2022 at 5:39 PM Sutou Kouhei wrote: > > > > > The Project Management Committee (PMC) for Apache Arrow has

Re: [C++] Read Flight data source into Acero

2022-09-07 Thread Wes McKinney
This seems like something where there should be ready-to-go code in the Arrow codebase to feed any RecordBatchReader into Acero On Thu, Aug 18, 2022 at 12:15 PM Li Jin wrote: > > Thanks all. I will try this out. > > On Thu, Aug 18, 2022 at 9:06 AM Rok Mihevc wrote: > > > +1 for adding this eithe

Re: Apache Software Foundation community survey 2022

2022-09-06 Thread Wes McKinney
hi Antoine — thank you for circulating this survey. Even though it takes a few minutes to complete I encourage community members to take the time to participate since data about community participation helps the ASF do better in the future. Thanks, Wes On Thu, Aug 25, 2022 at 2:10 AM Antoine Pitr

Re: [C++] Purpose of C++ bundled dependencies

2022-08-05 Thread Wes McKinney
The current libarrow_bundled_dependencies.a was created to address the problem of libarrow.a being "useless" (unable to be used to link with application code) if any dependencies were built by the Arrow build system (notably: this the case when using the default allocator jemalloc). I'm not sure wh

Re: [DISCUSS][Format] Starting to do some concrete work on the new "StringView" columnar data type

2022-08-05 Thread Wes McKinney
e at the very least some intermediate copies can be > skipped. > > Thanks, > Gosh > > On Tue, Aug 2, 2022, 2:49 PM Wes McKinney wrote: > > > On Tue, Aug 2, 2022 at 1:02 AM Antoine Pitrou wrote: > > > > > > > > > Le 01/08/2022 à 19:13, Wes McKinney a é

Re: [FlightSQL][JDBC] Additional changes to the JDBC driver

2022-08-05 Thread Wes McKinney
If you want to merge the cleared IP into a new branch rather than master, that is fine, too. It's not necessary to land it in the main branch On Tue, Aug 2, 2022 at 4:18 PM David Li wrote: > > Would it be OK to get what's there into the main branch first? i.e., open a > PR from the apache/flight

Re: [DISCUSS][Format] Starting to do some concrete work on the new "StringView" columnar data type

2022-08-02 Thread Wes McKinney
On Tue, Aug 2, 2022 at 1:02 AM Antoine Pitrou wrote: > > > Le 01/08/2022 à 19:13, Wes McKinney a écrit : > > > > If we start placing restrictions on how the out-of-line string buffers > > are managed and externalized, it risks undermining the zero-copy > > int

Re: [ARROW-17255] Logical JSON type in Arrow

2022-08-02 Thread Wes McKinney
I should add that since Parquet has JSON, BSON, and UUID types, that while UUID is just a simple fixed sized binary, that having the extension types so that the metadata flows through accurately to Parquet would be net beneficial: https://github.com/apache/parquet-format/blob/master/src/main/thrif

Re: [DISCUSS][Format] Starting to do some concrete work on the new "StringView" columnar data type

2022-08-01 Thread Wes McKinney
On Sun, Jul 31, 2022 at 8:05 AM Antoine Pitrou wrote: > > > Hi Wes, > > Le 31/07/2022 à 00:02, Wes McKinney a écrit : > > > > I understand there are still some aspects of this project that cause > > some squeamishness (like having arbitrary memory addresses embed

[DISCUSS][Format] Starting to do some concrete work on the new "StringView" columnar data type

2022-07-30 Thread Wes McKinney
hi folks, I'm interested to start doing some work to implement the "StringView" memory layout that we previously discussed late last year [1] with supporting document [2]. Since there's quite a few details to work out, my objective would be to do the work in a feature branch focused on a few thin

Re: [DISCUSS][Format] Dynamic data encodings in the IPC format and C ABI

2022-07-30 Thread Wes McKinney
is V5, so if we added a new batch type allowing for encodings, sparseness, etc., then we would need to bump the MetadataVersion to V6, but libraries implementing V6 metadata should be able to operate in V5 compatibility mode (sending non-encoded data in the current IPC format). > > [1] &g

[DISCUSS][Format] Dynamic data encodings in the IPC format and C ABI

2022-07-29 Thread Wes McKinney
hi all, Since we've been recently discussing adding new data types, memory formats, or data encodings to Arrow, I wanted to bring up a more "big picture" question around how we could support data whose encodings may change throughout the lifetime of a data stream sent via the IPC format (e.g. over

Re: [ARROW-17255] Logical JSON type in Arrow

2022-07-29 Thread Wes McKinney
his (Disclaimer I'm a > > colleague of Padeep's) > > > > [1] https://arrow.apache.org/docs/format/Columnar.html#extension-types > > > > > > On Fri, Jul 29, 2022 at 3:19 PM Wes McKinney wrote: > > > > > This seems like a common-enoug

Re: [ARROW-17255] Logical JSON type in Arrow

2022-07-29 Thread Wes McKinney
This seems like a common-enough data type that having a first-class logical type would be a good idea (perhaps even more so than UUID!). Compute engines would be able to implement kernels that provide manipulations of JSON data similar to what you can do with jq or GraphQL. On Fri, Jul 29, 2022 at

Re: [proposal] Arrow Intermediate Representation to facilitate the transformation of row-oriented data sources into Arrow columnar representation

2022-07-27 Thread Wes McKinney
We had an e-mail thread about this in 2018 https://lists.apache.org/thread/35pn7s8yzxozqmgx53ympxg63vjvggvm I still think having a canonical in-memory row format (and libraries to transform to and from Arrow columnar format) is a good idea — but there is the risk of ending up in the tar pit of re

Re: Help needed with PR #13659: Fixing build/unit test issues in msvc/win32

2022-07-25 Thread Wes McKinney
Suppressing the warnings on 32-bit MSVC sounds like a reasonable compromise. Is there an open PR for this (and what is the corresponding Jira issue so we don't lose track of it)? On Fri, Jul 22, 2022 at 1:23 PM Arkadiy Vertleyb (BLOOMBERG/ 120 PARK) wrote: > > Or live with the warnings. Or cast

Re: Proposal: renaming the 'master' branch to 'main'

2022-07-25 Thread Wes McKinney
hi all, Do you think we could make a push to make this happen after the 9.0.0 release goes out? Thanks Wes On Tue, Feb 15, 2022 at 2:32 PM Fiona La wrote: > > Thank you Antoine for bringing up the engineering work that is required to > enable this. And thank you Neal for sharing the link to th

Re: [C++] Moving from -O3 to -O2 optimization level in release builds

2022-07-21 Thread Wes McKinney
es selectively that > can be demonstrated to benefit from it (if anyone actually spends the time to > look into it). > > Sasha > > > On Jul 20, 2022, at 2:10 PM, Wes McKinney wrote: > > > > hi all, > > > > Antoine and I were digging into a weird issue w

[C++] Moving from -O3 to -O2 optimization level in release builds

2022-07-20 Thread Wes McKinney
hi all, Antoine and I were digging into a weird issue where gcc in -O3 generated ~40KB of optimized code for a function which was less than 2KB in -O2, and where a "leaner" implementation (in PR 13654) was yet faster and smaller. You can see some of the discussion at https://github.com/apache/arr

Re: [C++] Help with Parquet backward compatibility regression between 2.0.0 and 3.0.0

2022-07-18 Thread Wes McKinney
On Mon, Jul 18, 2022 at 2:35 AM Antoine Pitrou wrote: > > > Le 18/07/2022 à 03:54, Wes McKinney a écrit : > > This patch caused Parquet files written with 2.0.0 to be unreadable in > > 3.0.0 onward > > > > https://github.com/apache/arrow/commit/ef0feb2

Re: Problem reading parquet written with pyarrow=2.0.0 using pyarrow=8.0.0 (when using use_dictionary with ParquetWriter)

2022-07-17 Thread Wes McKinney
hi -- I git-bisected and found the backwards-compat regression, and reported here https://issues.apache.org/jira/browse/ARROW-17100 On Wed, Jul 6, 2022 at 4:16 PM Wes McKinney wrote: > > hi — did you ever resolve this issue? We should try to identify what > is causing this failure and

Re: [C++] Help with Parquet backward compatibility regression between 2.0.0 and 3.0.0

2022-07-17 Thread Wes McKinney
Jira issue for this: https://issues.apache.org/jira/browse/ARROW-17100 On Sun, Jul 17, 2022 at 8:54 PM Wes McKinney wrote: > > This patch caused Parquet files written with 2.0.0 to be unreadable in > 3.0.0 onward > > https://github.com/apache/arrow/commit/ef0feb2c9c959681d8a105cba

[C++] Help with Parquet backward compatibility regression between 2.0.0 and 3.0.0

2022-07-17 Thread Wes McKinney
This patch caused Parquet files written with 2.0.0 to be unreadable in 3.0.0 onward https://github.com/apache/arrow/commit/ef0feb2c9c959681d8a105cbadc1ae6580789e69 This was reported on June 14 on dev@ and I git-bisected to the root cause: https://lists.apache.org/thread/wtbqozdhj2hwn6f0sps2j70lr

Re: [C++] Adding Run-Length Encoding to Arrow

2022-07-08 Thread Wes McKinney
pache/arrow/pull/13330 > >> Encode/Decode functions for (currently fixed width types only) > >> > >> - https://github.com/apache/arrow/pull/1 > >> For updating docs > >> > >> Best, > >> Tobias > >> > >> Am Dienstag, d

Re: Problem reading parquet written with pyarrow=2.0.0 using pyarrow=8.0.0 (when using use_dictionary with ParquetWriter)

2022-07-06 Thread Wes McKinney
hi — did you ever resolve this issue? We should try to identify what is causing this failure and see if it can be fixed for the 9.0.0 release. On Tue, Jun 14, 2022 at 8:18 AM Niklas Bivald wrote: > > Hi, > > I’m experiencing problem reading parquet files written with the > `use_dictionary=[]` op

Re: Existence/name/scope for minimal C/C++ Arrow C Data interface helpers

2022-07-06 Thread Wes McKinney
). A lightweight, dependency-free library to help > >>>> constructing those would certainly be appreciated. What would also help > >>>> a > >>>> lot is validation code, Arrow structures are very delicate and one wrong > >>>> pointer

Re: [C++] Kernel function registry evolution

2022-06-29 Thread Wes McKinney
te: > > > > > > Does boxing a scalar into an array actually build a buffer with the > > repeated value, or is it more efficient than that? > > > > > > Le 29/06/2022 à 17:57, Wes McKinney a écrit : > > > I'm working on my next PR which addresses the

Re: [C++] Kernel function registry evolution

2022-06-29 Thread Wes McKinney
can address follow-on improvements like rewriting expression evaluation to utilize the span data structures to yield performance gains. On Mon, Jun 13, 2022 at 12:37 PM Wes McKinney wrote: > > I merged the PR a little while ago — thanks for David, Sasha for > helping review. If you have more com

Re: [C++] Kernel function registry evolution

2022-06-13 Thread Wes McKinney
ll help us delete a lot of code I'll attach related Jiras to this umbrella issue: https://issues.apache.org/jira/browse/ARROW-16755 On Fri, Jun 10, 2022 at 12:56 PM Wes McKinney wrote: > > PR is up: https://github.com/apache/arrow/pull/13364 > > Look forward to getting this in since t

Re: [C++] Kernel function registry evolution

2022-06-10 Thread Wes McKinney
PR is up: https://github.com/apache/arrow/pull/13364 Look forward to getting this in since there's a bunch of follow on work that I'd like to get started on ASAP! On Thu, Jun 9, 2022 at 7:34 AM Wes McKinney wrote: > > I'm making good progress getting my branch PR-ready --

Re: [C++] Kernel function registry evolution

2022-06-09 Thread Wes McKinney
I'm making good progress getting my branch PR-ready -- working through the compute-scalar-test suite and fixing the little things I broke. I hope I'll have it done by the end of the week. On Mon, Jun 6, 2022 at 3:21 PM Wes McKinney wrote: > > I created https://issues.apache.org/j

Re: [C++] Kernel function registry evolution

2022-06-06 Thread Wes McKinney
ly as I can to have my initial patch ARROW-16756 ready which will unblock the next few projects here On Mon, Jun 6, 2022 at 10:35 AM Wes McKinney wrote: > > This is definitely only the first stage of cleanup and streamlining — > I anticipate multiple rounds of refactoring (maybe not a

Re: [C++] Kernel function registry evolution

2022-06-06 Thread Wes McKinney
This is definitely only the first stage of cleanup and streamlining — I anticipate multiple rounds of refactoring (maybe not as invasive and painful as this one), and this patch I'm not sure will do a lot to alleviate bottom line expression evaluation overhead but it creates the environment (i.e.

Re: [C++] Kernel function registry evolution

2022-06-05 Thread Wes McKinney
ould say it's a couple days away from being review-ready: https://github.com/apache/arrow/compare/master...wesm:lightweight-exec-batch I'll post a PR when I have something closer to a green build. We probably won't want to let this PR linger since it will cause conflicts with any

Re: RecordBatchFileWriter with DictionaryType: Making sure the dictionary stays the same

2022-06-03 Thread Wes McKinney
There's a relevant Jira issue here (maybe some others), if someone wants to pick it up and write a kernel for it https://issues.apache.org/jira/browse/ARROW-4097 I think having an improved experience around this dictionary conformance/normalization problem would be valuable. On Tue, May 31, 2022

Re: [C++] Kernel function registry evolution

2022-06-03 Thread Wes McKinney
o, if we know we are also going to want to tweak the output > > interface (I don't know for sure if we will) then maybe it makes sense > > to pick a small set of kernels and incrementally improve that small > > set until we think we've made all the changes we are going to

Re: [C++] Kernel function registry evolution

2022-06-02 Thread Wes McKinney
On this topic, I actually have started prototyping a new ScalarKernel exec interface that uses a non-owning, shared_ptr-free "ArraySpan" data structure based on some prior conversations https://github.com/wesm/arrow/blob/711fd5e5665c280540bbaf48a48ca1eca1b91bff/cpp/src/arrow/compute/exec.h#L163 ht

Re: [Dev] Switch to token authentication for archery & merge script

2022-06-01 Thread Wes McKinney
hi Jacob — this sounds very reasonable and fixes a rough edge for maintainers running into captcha issues. Thanks Wes On Wed, Jun 1, 2022 at 6:44 AM Jacob Wujciak wrote: > > Hello Everyone, > > I would like to propose that we switch from basic authentication with JIRA > in the merge script and a

Re: [DISC] Improving Arrow's database support

2022-06-01 Thread Wes McKinney
I went ahead and created https://github.com/apache/arrow-adbc I directed issue comments / PRs to issues@ On Tue, May 31, 2022 at 8:49 PM Wes McKinney wrote: > > I think spinning up a new repository while this exploratory work > progresses is a fine idea — perhaps apache/arrow-dbc / a

Re: [DISC] Improving Arrow's database support

2022-05-31 Thread Wes McKinney
individually leverage the Arrow libraries). Of course, maintaining a parallel > build system, setting up releases, etc. is also a lot of work. > > -David > > On Tue, Apr 26, 2022, at 15:01, Wes McKinney wrote: > > I don't have major new things to add on this topic except that I

Re: [C++] Adding Run-Length Encoding to Arrow

2022-05-31 Thread Wes McKinney
I haven't had a chance to look at the branch in detail, but if you can provide a pointer to a specification or other details about the proposed memory format for RLE (basically: what would be added to the columnar documentation as well as the Flatbuffers schema files), it would be helpful so it can

Re: Existence/name/scope for minimal C/C++ Arrow C Data interface helpers

2022-05-31 Thread Wes McKinney
I'm also supportive of having a small vendorable C/C++ "Arrow middleware" that provides: * Schemas and types * Columnar data structures and minimal APIs to build them and iterate over them * C data interface * Minimal validation (at the level of Validate but not ValidateFull) I don't think it's g

Re: [DISCUSS] "Naming" the Arrow C++ execution engine subproject?

2022-05-19 Thread Wes McKinney
; > > "Acero" has a nice ring to it. Almost as if you said "ACE Arrow" really > > > fast. And maybe the steel / iron meaning gives a sort of close-to-metal > > > vibes (similar to what Rust's name invokes), though I'm not a Spanish > >

Re: Merge a pull request with GitHub API

2022-05-18 Thread Wes McKinney
One of the benefits of the current merge script is that the PR description is preserved (maybe this could be possible with this method) — authors and co-authors are preserved by the explicit by-lines, e.g. Lead-authored-by: Nic Crane Co-authored-by: Ian Cook Signed-off-by: Ian Cook I assume th

Re: [VOTE] [Rust] Move Ballista to new arrow-ballista repository

2022-05-17 Thread Wes McKinney
+1 (binding) On Tue, May 17, 2022 at 4:10 AM vin jake wrote: > > +1, It's reasonable > > On Mon, May 16, 2022 at 9:56 PM Andy Grove wrote: > > > I would like to propose that we move the Ballista project to a new > > top-level *arrow-ballista* repository. > > > > The rationale for this (copied fr

June 23 virtual conference to highlight work in the Arrow ecosystem

2022-05-13 Thread Wes McKinney
hi all, My employer (Voltron Data) is organizing a free virtual conference on June 23 to highlight development work and usage of Apache Arrow — you can register for this or apply to give a talk here: https://thedatathread.com/ We are especially interested in hearing from users (as opposed to onl

Re: Arrow sync call May 11 at 12:00 US/Eastern, 16:00 UTC

2022-05-12 Thread Wes McKinney
> Discussion about whether the community around Arrow would like to have > DataFrame-like APIs for Arrow in more languages, for example C++ We've discussed this a bit on the mailing list in the past, see https://docs.google.com/document/d/1XHe_j87n2VHGzEbnLe786GHbbcbrzbjgG8D0IXWAeHg/edit#heading

Re: [C++] Control flow and scheduling in C++ Engine operators / exec nodes

2022-05-11 Thread Wes McKinney
ay. > > [1] https://github.com/apache/arrow/pull/12894 > [2] https://lists.apache.org/thread/mp68ofm2hnvs2v2oz276rvw7y5kwqoyd > [3] https://github.com/apache/arrow/pull/12755 > On Mon, May 2, 2022 at 1:20 PM Wes McKinney wrote: > > > > hi all, > > > > I

Re: [DISCUSS] "Naming" the Arrow C++ execution engine subproject?

2022-05-10 Thread Wes McKinney
t;>> and AQE being a loaded term in query engines already. > > >>>>> > > >>>>> > > >>>>> On Tue, Mar 29, 2022 at 10:07 AM Andy Grove > > >>> wrote: > > >>>>> > > >>>>>>

[C++] Control flow and scheduling in C++ Engine operators / exec nodes

2022-05-02 Thread Wes McKinney
hi all, I've been catching up on the C++ execution engine codebase after a fairly long development hiatus. I have several questions / comments about the current design of the ExecNode and their implementations (currently: source / scan, filter, project, union, aggregate, sink, hash join). My cur

Re: [DISC] Improving Arrow's database support

2022-04-26 Thread Wes McKinney
I don't have major new things to add on this topic except that I've long had the aspiration of creating something like Python's DBAPI 2.0 [1] at the C or C++ level to enable a measure of API standardization for Arrow-native read/write interfaces with database drivers. It seems like a natural comple

Designing standards for "sandboxed" Arrow user-defined functions [was Re: User defined "Arrow Compute Function"]

2022-04-25 Thread Wes McKinney
I was going to reply to this e-mail thread on user@ but thought I would start a new thread on dev@. Executing user-defined functions in memory, especially untrusted functions, in general is unsafe. For "trusted" functions, having an in-memory API for writing them in user languages is very useful.

Re: [VOTE] Extend Arrow Flight SQL with more SQL type info in schemas

2022-04-25 Thread Wes McKinney
+1 (binding) I agree with the comments on the PR that it would be good to better explain what the "type name" is or give an example or reference in the code comments On Thu, Apr 21, 2022 at 11:49 AM José Almeida wrote: > > +1 (non binding) > > On Thu, Apr 21, 2022 at 1:49 PM Rafael Telles wrote

Re: [Discuss][Format] Add 32-bit and 64-bit Decimals

2022-04-25 Thread Wes McKinney
I think there's a couple of embedded / entangled questions here that about this: * Should Arrow be able to be used to *transport* narrow decimals — for the (now very abundant) use cases where Arrow is being used as an internal wire protocol or client/server interface * Should *compute engines* th

Re: PyArrow / Arrow questions about the time and date types

2022-04-04 Thread Wes McKinney
On Fri, Apr 1, 2022 at 2:00 PM Weston Pace wrote: > > > *Question 1*: For my own understanding: what purpose does the > > millisecond date64 type serve? > > I don't actually know the answer to this one. The rationale IIRC was that some systems represent dates this way, and so the purpose was to p

Re: [Flight][Java][JDBC] IP clearance of Flight JDBC Driver

2022-04-04 Thread Wes McKinney
A corporate CLA is not required. Individual CLAs are fine. Since Dremio is a US corporation and the IP for the JDBC driver is owned by Dremio (I assume that the contributors all have IP assignment agreements where their contributions are assigned to the corporation), it would be best to have a Sof

[DISCUSS] "Naming" the Arrow C++ execution engine subproject?

2022-03-28 Thread Wes McKinney
hi all, There has been a steady stream of work over the last year and a half or so to create a set of query engine building blocks in C++ to evaluate queries against Arrow Datasets and input streams, which can be of use to applications that are already building on top of the Arrow C++ project. Thi

Re: [VOTE] Extend Arrow Flight SQL with GetXdbcTypeInfo, SQL type info in schemas

2022-03-27 Thread Wes McKinney
Adding my +1 (binding) vote (technically votes need 3 binding +1's so this will pass) On Fri, Mar 25, 2022 at 4:12 PM David Li wrote: > > The vote has been open for a while now without objection, so the vote passes > with 2 +1 votes (binding), 4 +1 votes (non-binding). > > Thanks to all the cont

Adding Apache Arrow to the registry of Digital Public Goods

2022-03-25 Thread Wes McKinney
As some research groups, e.g. at public universities, are doing work that involves Apache Arrow, I have learned that it would be beneficial in terms of access to funding if Arrow were registered as a Digital Public Good. Here is an example of another Apache project, Fineract, which is listed as suc

  1   2   3   4   5   6   7   8   9   10   >