Re: [ANNOUNCE] New Arrow PMC member: Alenka Frim

2025-07-01 Thread Weston Pace
Congratulations Alenka! On Tue, Jul 1, 2025 at 8:52 AM Bryce Mecum wrote: > Congrats Alenka! Thanks for all you do. > > On Tue, Jul 1, 2025 at 12:38 AM Raúl Cumplido wrote: > > > > The Project Management Committee (PMC) for Apache Arrow has invited > Alenka > > Frim to become a PMC member and w

Re: [VOTE][Format] Extend Flight Location URI Semantics

2025-05-02 Thread Weston Pace
+1 (binding) On Thu, May 1, 2025 at 10:14 PM Dewey Dunnington wrote: > +1 (binding)! > > On Thu, May 1, 2025 at 10:42 PM Ian Cook wrote: > > > +1 (binding) > > > > Ian > > > > On Thu, May 1, 2025 at 9:51 PM David Li wrote: > > > > > +1 (binding) > > > > > > On Fri, May 2, 2025, at 08:00, Joel

Re: [DISCUSS] Split JS release process

2025-04-15 Thread Weston Pace
+1 from me, assuming this is acceptable to domoritz / trxcllnt. I feel we have struggled to find maintainers for JS (outside of a few dedicated and extremely helpful ones). Ideally (perhaps idealistically), separating the code into its own repository will help reduce the barrier for those who wan

[DISCUSS] Turtle canonical extension type

2025-04-01 Thread Weston Pace
I've written a draft at [1] but for simplicity's sake I will include the text of the proposal inline below. [1] https://github.com/westonpace/arrow/tree/feat/turtle-extension-type TURTLE == * Extension name: ``arrow.turtle``. * The storage type of the extension is ``Struct`` where the struc

Re: Request for comments on adding new IPC option 'ensure_memory_alignment'

2025-03-27 Thread Weston Pace
First, the subject of this email is IPC, which is confusing. From the discussion it sounds like we are primarily talking about FFI. It sounds like the options here are: 1. Silently realign unless user opts out of realignment The user will not get any runtime errors (easier to use) but a method

Re: [ANNOUNCE] New Arrow PMC member: Rok Mihevc

2025-03-24 Thread Weston Pace
Congrats Rok! Glad to have you here! On Thu, Mar 20, 2025 at 5:18 AM Rok Mihevc wrote: > Thank you! It's a pleasure working with you all! > > Best, > Rok > > On Thu, Mar 20, 2025 at 12:06 PM Alenka Frim > wrote: > > > Yay! Congratulations Rok, well deserved!! > > > > V čet., 20. mar. 2025, 04:

Re: [ANNOUNCE] New Arrow PMC member: Ian Cook

2025-03-21 Thread Weston Pace
Congrats Ian! On Fri, Mar 21, 2025 at 12:03 PM Saurabh Singh wrote: > Congratulations Ian! > > On Fri, 21 Mar 2025 at 19:28, Felipe Oliveira Carvalho < > felipe...@gmail.com> > wrote: > > > Congrats! 🚀 > > > > On Fri, Mar 21, 2025 at 7:23 AM Nic Crane wrote: > > > > > Congrats! > > > > > > On T

Re: [ANNOUNCE] New Arrow committer: Matthijs Brobbel (mbrobbel)

2025-03-21 Thread Weston Pace
Congrats Matthjis! On Fri, Mar 21, 2025 at 1:51 PM Andrew Lamb wrote: > Hi, > > On behalf of the Arrow PMC, I'm happy to announce that Matthijs Brobbel > has accepted an invitation to become a committer on Apache > Arrow. Welcome, and thank you for your contributions! > > Andrew >

Re: [ANNOUNCE] New Arrow PMC member: Jacob Wujciak

2025-03-17 Thread Weston Pace
Congrats Jacob! On Mon, Mar 17, 2025 at 5:04 AM Neal Richardson wrote: > Congratulations! > > On Mon, Mar 17, 2025 at 7:00 AM Ian Cook wrote: > > > Congratulations Jacob! > > > > On Mon, Mar 17, 2025 at 05:16 Raúl Cumplido wrote: > > > > > Congratulations Jacob! > > > > > > El lun, 17 mar 2025

Re: [DISCUSS] Do we want to enable GitHub Discussions for apache/arrow?

2025-03-16 Thread Weston Pace
+1 > A possible reason for hesitation is that it provides us yet another stream that requires maintainer attention I had been lukewarm on discussions in the past partly because of this and partly because they cannot be "closed" and grow forever. However, I have since learned that the latter conc

Re: [ANNOUNCE] New Arrow committer: Jean-Baptiste Onofré

2025-03-10 Thread Weston Pace
Congrats and welcome JB! On Mon, Mar 10, 2025 at 12:42 PM Bryce Mecum wrote: > Congrats JB. And welcome. > > On Sun, Mar 9, 2025 at 11:05 PM Sutou Kouhei wrote: > > > > Hi, > > > > On behalf of the Arrow PMC, I'm happy to announce that > > Jean-Baptiste Onofré has accepted an invitation to beco

Re: [VOTE][RUST] Release Apache Arrow Rust 54.2.1 RC1

2025-02-27 Thread Weston Pace
+1 (binding) Verified on aarch64 Ubuntu 24.04 On Thu, Feb 27, 2025 at 5:00 AM Xuanwo wrote: > +1 non-binding > > Tested locally on archlinux x86_64. > > On Thu, Feb 27, 2025, at 20:36, Raphael Taylor-Davies wrote: > > +1 (binding) > > > > Verified on x86_64 GNU/Linux > > > > On 27/02/2025 12:23,

Re: [ANNOUNCE] New Arrow PMC member: Bryce Mecum

2025-02-05 Thread Weston Pace
Congrats Bryce! On Wed, Feb 5, 2025 at 8:35 PM Saurabh Singh wrote: > Congratulations Bryce. > > On Thu, 6 Feb 2025 at 07:41, Gang Wu wrote: > > > Congrats Bryce! > > > > On Thu, Feb 6, 2025 at 9:57 AM Ruoxi Sun wrote: > > > > > Congrats Bryce, well deserved! > > > > > > > > > *Regards,* > > >

Re: [ANNOUNCE] New Arrow committer: Ed Seidl (etseidl)

2025-01-29 Thread Weston Pace
Congratulations Ed! On Wed, Jan 29, 2025 at 2:20 AM Andrew Lamb wrote: > On behalf of the Arrow PMC, I'm happy to announce that Ed Seidl > has accepted an invitation to become a committer on Apache > Arrow. Welcome, and thank you for your contributions! > > Andrew >

Re: [ANNOUNCE] New Arrow PMC member: Gang Wu

2024-12-03 Thread Weston Pace
Congratulations! On Tue, Dec 3, 2024, 3:21 PM Ian Cook wrote: > Congratulations and thanks for all your great work—not just on Arrow but on > so many parts of the surrounding ecosystem! > > On Tue, Dec 3, 2024 at 6:15 PM David Li wrote: > > > Congrats! > > > > On Wed, Dec 4, 2024, at 07:38, Rok

Re: [C++] Arrow S3 filesystem init/finalize

2024-12-01 Thread Weston Pace
> Admittedly, I would like to know why this is being done in this fashion, but that is tangential to my issue. IIRC, this is a limitation given to use by the AWS C++ SDK. See [1]. The AWS C++ SDK has static state and they do not manage it with static local variables. As a result, the initializa

Re: [ANNOUNCE] New Arrow committer: Laurent Goujon

2024-11-25 Thread Weston Pace
Congratulations Laurent! On Mon, Nov 25, 2024 at 4:28 AM wish maple wrote: > Congrats! > > Best, > Xuwei Fu > > David Li 于2024年11月25日周一 17:35写道: > > > On behalf of the Arrow PMC, I'm happy to announce that Laurent Goujon has > > accepted an invitation to become a committer on Apache Arrow. Welc

Re: [ANNOUNCE] New Arrow committer: Adam Reeve

2024-11-18 Thread Weston Pace
Congratulations! On Mon, Nov 18, 2024, 7:13 PM Jacob Wujciak wrote: > Congratulations and welcome Adam! > > Am Di., 19. Nov. 2024 um 03:31 Uhr schrieb wish maple < > maplewish...@gmail.com>: > > > > Congrets Adam! > > > > Best, > > Xuwei Fu > > > > Sutou Kouhei 于2024年11月19日周二 08:31写道: > > > > >

Re: [ANNOUNCE] New Arrow PMC chair: Neil Richardson

2024-10-30 Thread Weston Pace
Congrats Neal! On Wed, Oct 30, 2024, 4:46 AM Raúl Cumplido wrote: > Thanks Andy for your work during last year and thanks Neal for stepping up! > > Raúl > > El mié, 30 oct 2024 a las 12:28, Andrew Lamb () > escribió: > > > > I am pleased to announce that the Arrow Project has a new PMC chair and

[ANNOUNCE] New Arrow committer: Rossi Sun

2024-10-22 Thread Weston Pace
On behalf of the Arrow PMC, I'm happy to announce that Rossi Sun has accepted an invitation to become a committer on Apache Arrow. Welcome, and thank you for your contributions!

Re: Query of Arrow Flight SQL with S3 as a storage for parquet files

2024-10-16 Thread Weston Pace
> Do you folks believe Duckdb and Datafusion (latter being similar to spark sql) will be an overkill? No, I don't believe it would be overkill. I also wouldn't compare either one to Spark SQL. Spark SQL is meant to be a distributed query engine that typically requires a cluster of some sort to o

Re: [C++][ACERO][DATASET] Ordering in ExecPlan

2024-10-04 Thread Weston Pace
> Currently, it is not > possible to assign the Implicit ordering in scan node. Such option has > been added in another nodes[0]. This problem is mentioned here [1]. I > have started to work on it [2] but I am unsure how to move forward > because I did not fine any clear roadmap about ordering in g

Re: [ANNOUNCE] New Arrow committer: Will Ayd

2024-10-01 Thread Weston Pace
Congratulations Will On Tue, Oct 1, 2024, 2:25 PM Bryce Mecum wrote: > Congrats Will! > > On Tue, Oct 1, 2024 at 9:55 AM Dewey Dunnington > wrote: > > > > On behalf of the Arrow PMC, I'm happy to announce that Will Wyd has > > accepted an invitation to become a committer on Apache Arrow. Welcom

Re: [DISCUSS] Variant Spec Location

2024-08-22 Thread Weston Pace
It also seems that two variations of the variant encoding are being discussed. The original spec, currently housed in Spark, creates a variant array in row-major order, that is, each element in the array, is contained contiguously. So, if you have objects like `{"a": 7, "b": 3}` then the values f

Re: Seattle Arrow Meetup: August 13th, 2024

2024-08-16 Thread Weston Pace
Notes from the meetup: https://docs.google.com/document/d/1g0_oiEE0GPQP24LAP3Z8pGSoWnw6C3bsr5If8K7c_Xw/edit?usp=sharing Thanks to Bryce for taking notes! On Mon, Aug 12, 2024 at 6:15 AM Weston Pace wrote: > The exact location has been updated in the doc. Looking forward to seeing > s

Re: Seattle Arrow Meetup: August 13th, 2024

2024-08-12 Thread Weston Pace
The exact location has been updated in the doc. Looking forward to seeing some of you tomorrow. On Thu, Jul 18, 2024 at 11:41 AM Weston Pace wrote: > I'd like to announce an Arrow Meetup on August 13th, 2024 from 5:30PM to > 7:30PM. Details can be found at [1]. All are welcome.

Re: [Discuss] Async interface for C Data Stream interface

2024-08-10 Thread Weston Pace
+1 for getting some kind of async implementation into the spec. I have proposed a few alternative approaches in the PR. On Fri, Aug 9, 2024 at 1:18 PM Matt Topol wrote: > Hello All, I'd like to discuss the potential addition of an > asynchronous-oriented version of the C Data Stream interface.

Re: [VOTE][Format] Bool8 Canonical Extension Type

2024-08-05 Thread Weston Pace
+1 (binding) Looked through the spec & C++/python PRs. On Mon, Aug 5, 2024 at 7:41 AM Ian Cook wrote: > +1 (non-binding) > > I reviewed the spec addition. > > On Mon, Aug 5, 2024 at 3:37 PM Antoine Pitrou wrote: > > > > > Binding +1 (but posted one minor comment on the format PR). > > > > Than

Re: [DISCUSS][Acero] Upgrading to 64-bit row offsets in row table

2024-08-05 Thread Weston Pace
+1 as well. 32 bit keys were chosen because the expectation was that hashtable spilling would come along soon. Since it didn't, I think it's a good idea to use 64-bit keys until spilling is added. On Mon, Aug 5, 2024 at 6:05 AM Antoine Pitrou wrote: > > I don't have any concrete data to test t

Re: [VOTE][Format] Opaque canonical extension type

2024-07-24 Thread Weston Pace
+1 (binding) On Wed, Jul 24, 2024 at 8:01 AM Dane Pitkin wrote: > +1 (non-binding) > > I reviewed the spec and Java implementation. > > On Wed, Jul 24, 2024 at 10:37 AM Ian Cook wrote: > > > +1 (non-binding) > > > > I reviewed the spec additions. > > > > Ian > > > > On Wed, Jul 24, 2024 at 10:2

Seattle Arrow Meetup: August 13th, 2024

2024-07-18 Thread Weston Pace
I'd like to announce an Arrow Meetup on August 13th, 2024 from 5:30PM to 7:30PM. Details can be found at [1]. All are welcome. We will be discussing what’s going on in the Arrow community and what community members have planned or would like to see in the coming years. If you think you can make

Re: [DISCUSS][C++] Empty directory marker creation in S3FileSystem

2024-07-12 Thread Weston Pace
ng a coherent abstraction extremely difficult. > > > > > Iceberg also took a similar approach with its File IO abstraction [2]. > > > > > [1]: > > > https://docs.rs/object_store/latest/object_store/#why-not-a-filesystem-interface > > [2]: https://ta

Re: [DISCUSS][C++] Empty directory marker creation in S3FileSystem

2024-07-12 Thread Weston Pace
> The markers are necessary to offer file system semantics on top of object > stores. You will get a ton of subtle bugs otherwise. Yes, object stores and filesystems are different. If you expect your filesystem to act like a filesystem then these things need to be done in order to avoid these bug

Re: [DISCUSS] Approach to generic schema representation

2024-07-08 Thread Weston Pace
turn pa.list_(pa_type) > > elif isinstance(obj, dict): > > items = [] > > for k, child_obj in obj.items(): > > pa_type = _convert_to_arrow_type(k, child_obj) > > items.append((k, pa_type)) > > return pa.struct(

Re: [DISCUSS] Approach to generic schema representation

2024-07-08 Thread Weston Pace
+1 for empty stream/file as schema serialization. I have used this approach myself on more than one occasion and it works well. It can even be useful for transmitting schemas between different arrow-native libraries in the same language (e.g. rust->rust) since it allows the different libraries to

Re: [C++][Python] [Parquet] Parquet Reader C++ vs python benchmark

2024-06-13 Thread Weston Pace
pyarrow uses c++ code internally. With the large files I would guess that less than 0.1% of your pyarrow benchmark is spent in the python interpreter. Given this fact, my main advice is to not worry too much about the difference between pyarrow and carrow. A lot of work goes into pyarrow to make

Seattle Arrow meetup (adjacent to post::conf)

2024-05-29 Thread Weston Pace
I've noticed that a number of Arrow people will be in Seattle in August. I know there are a number of Arrow contributors that live in the Seattle area as well. I'd like to organize a face-to-face meetup for the Arrow community and have created an issue for discussion[1]. I welcome any input, fee

Re: [DISCUSS] Drop Java 8 support

2024-05-24 Thread Weston Pace
No vote is required from an ASF perspective (this is not a release) No vote is required from Arrow conventions (this is not a spec change and does not impact more than one implementation) I will send a message to the parquet ML to solicit feedback. On Fri, May 24, 2024 at 8:22 AM Laurent Goujon

Re: [DISCUSS] Statistics through the C data interface

2024-05-24 Thread Weston Pace
> I think what we are slowly converging on is the need for a spec to > describe the encoding of Arrow array statistics as Arrow arrays. This has been something that has always been desired for the Arrow IPC format too. My preference would be (apologies if this has been mentioned before): - Agree

Re: [VOTE] Release Apache Arrow ADBC 12 - RC4

2024-05-20 Thread Weston Pace
+1 (binding) I also tested on Ubuntu 22.04 with USE_CONDA=1 dev/release/verify-release-candidate.sh 12 4 On Mon, May 20, 2024 at 5:20 AM David Li wrote: > My vote: +1 (binding) > > Are any other PMC members able to take a look? > > On Fri, May 17, 2024, at 23:36, Dewey Dunnington wrote: > > +1

Re: [ANNOUNCE] New Arrow committer: Dane Pitkin

2024-05-07 Thread Weston Pace
Congrats Dane! On Tue, May 7, 2024, 7:30 AM Nic Crane wrote: > Congrats Dane, well deserved! > > On Tue, 7 May 2024 at 15:16, Gang Wu wrote: > > > > Congratulations Dane! > > > > Best, > > Gang > > > > On Tue, May 7, 2024 at 10:12 PM Ian Cook wrote: > > > > > Congratulations Dane! > > > > > >

Re: [Discuss] Extension types based on canonical extension types?

2024-04-30 Thread Weston Pace
I think "inheritance" and "composition" are more concerns for implementations than they are for spec (I could be wrong here). So it seems that it would be sufficient to write the HLLSKETCH's canonical definition as "this is an extension of the JSON logical type and supports all the same storage ty

Re: [VOTE][Format] UUID canonical extension type

2024-04-30 Thread Weston Pace
+1 (binding) On Tue, Apr 30, 2024 at 7:53 AM Rok Mihevc wrote: > Thanks for all the reviews and comments! I've included the big-endian > requirement so the proposed language is now as below. > I'll leave the vote open until after the May holiday. > > Rok > > UUID > > > * Extension name: `ar

Re: [VOTE][Format] JSON canonical extension type

2024-04-30 Thread Weston Pace
+1 (binding) I agree we should be explicit about RFC-8259 On Mon, Apr 29, 2024 at 4:46 PM David Li wrote: > +1 (binding) > > assuming we explicitly state RFC-8259 > > On Tue, Apr 30, 2024, at 08:02, Matt Topol wrote: > > +1 (binding) > > > > On Mon, Apr 29, 2024 at 5:36 PM Ian Cook wrote: > >

Re: [DISCUSSION] New Flags for Arrow C Interface Schema

2024-04-24 Thread Weston Pace
e from_arrow_device method which returns a cudf::table? > > Should > > > it error, or should it create a table with a single column? > > > > Presumably it should just error? I can see this being ambiguous if there > > were an API that dynamically returned either

Re: Fwd: PyArrow Using Parquet V2

2024-04-24 Thread Weston Pace
> *As per Apache Parquet Community Parquet V2 is not final yet so it is not > official . They are advising not to use Parquet V2 for writing (though code > is available ) .* This would be news to me. Parquet releases are listed (by the parquet community) at [1] The vote to release parquet 2.10 i

Re: [DISCUSSION] New Flags for Arrow C Interface Schema

2024-04-22 Thread Weston Pace
I tend to agree with Dewey. Using run-end-encoding to represent a scalar is clever and would keep the c data interface more compact. Also, a struct array is a superset of a record batch (assuming the metadata is kept in the schema). Consumers should always be able to deserialize into a struct ar

Re: Unsupported/Other Type

2024-04-17 Thread Weston Pace
> people generally find use in Arrow schemas independently of concrete data. This makes sense. I think we do want to encourage use of Arrow as a "type system" even if there is no data involved. And, given that we cannot easily change a field's data type property to "optional" it makes sense to u

Re: Unsupported/Other Type

2024-04-17 Thread Weston Pace
> may want an Other type to signal that it would fail if asked to provide particular columns. I interpret "would fail" to mean we are still speaking in some kind of "planning stage" and not yet actually creating arrays. So I don't know that this needs to be a data type. In other words, I see thi

Re: [ANNOUNCE] New Arrow committer: Sarah Gilmore

2024-04-11 Thread Weston Pace
Congratulations! On Thu, Apr 11, 2024 at 9:12 AM wish maple wrote: > Congrats! > > Best, > Xuwei Fu > > Kevin Gurney 于2024年4月11日周四 23:22写道: > > > Congratulations, Sarah!! Well deserved! > > > > From: Jacob Wujciak > > Sent: Thursday, April 11, 2024 11:14 AM > >

Re: [DISCUSS] Versioning and releases for apache/arrow components

2024-04-08 Thread Weston Pace
> Probably major versions should match between C++ and PyArrow, but I guess > we could have diverging minor and patch versions. Or at least patch > versions given that > a new minor version is usually cut for bug fixes too. I believe even this would be difficult. Stable ABIs are very finicky in C

Re: [VOTE] Protocol for Dissociated Arrow IPC Transports

2024-04-02 Thread Weston Pace
Forgot link: [1] https://developer.mozilla.org/en-US/docs/WebAssembly/JavaScript_interface/Memory On Tue, Apr 2, 2024 at 11:38 AM Weston Pace wrote: > Thanks for taking the time to address my concerns. > > > I've split the S3/HTTP URI flight pieces out into a separate document

Re: [VOTE] Protocol for Dissociated Arrow IPC Transports

2024-04-02 Thread Weston Pace
was easier for iterating on the > > protocol > > > specification than a markdown PR for the Arrow documentation as I could > > > more visually express things without a preview of the rendered > markdown. > > If > > > it would get people to be more li

Re: [Format][Union] polymorphic vectors vs ADT style vectors

2024-04-02 Thread Weston Pace
Wouldn't support for ADT require expressing more than 1 type id per record? In other words, if `put` has type id 1, `delete` has type id 2, and `erase` has type id 3 then there is no way to express something is (for example) both type id 1 and type id 3 because you can only have one type id per re

Re: [ANNOUNCE] New Committer Joel Lubinitsky

2024-04-01 Thread Weston Pace
Congratulations Joel! On Mon, Apr 1, 2024 at 1:16 PM Bryce Mecum wrote: > Congrats, Joel! > > On Mon, Apr 1, 2024 at 6:59 AM Matt Topol wrote: > > > > On behalf of the Arrow PMC, I'm happy to announce that Joel Lubinitsky > has > > accepted an invitation to become a committer on Apache Arrow. W

Re: [DISCUSS] Versioning and releases for apache/arrow components

2024-03-29 Thread Weston Pace
Thank you for bringing this up. I'm in favor of this. I think there are several motivations but the main ones are: 1. Decoupling the versions will allow components to have no release, or only a minor release, when there are no breaking changes 2. We do have some vote fatigue I think and we don

Re: [VOTE] Protocol for Dissociated Arrow IPC Transports

2024-03-28 Thread Weston Pace
I'm sorry for the very late reply. Until yesterday I had no real concept of what this was talking about and so I had stayed out. I'm +0 only because it isn't clear what we are voting on. There is a word doc with no implementation or PR. I think there could be an implementation / PR. For exampl

Re: Apache Arrow Flight - From Rust to Javascript (FlightData)

2024-03-21 Thread Weston Pace
> I don't think there is currently a direct equivalent to > `FlightRecordBatchStream` in the arrow javascript library, but you should > be able to combine the data header + body and then read it using the > `fromIPC` functions since it's just the Arrow IPC format The RecordBatchReader[1] _should_

Re: [ANNOUNCE] New Arrow committer: Bryce Mecum

2024-03-17 Thread Weston Pace
Congratulations! On Sun, Mar 17, 2024, 8:01 PM Jacob Wujciak wrote: > Congrats, well deserved! > > Nic Crane schrieb am Mo., 18. März 2024, 03:24: > > > On behalf of the Arrow PMC, I'm happy to announce that Bryce Mecum has > > accepted an invitation to become a committer on Apache Arrow. Welco

Re: [DISCUSS] Looking for feedback on my Rust library

2024-03-14 Thread Weston Pace
Felipe's points are good. I don't know that you need to adapt the entire ADBC, it sort of depends what you're after. I see what you've got right now as more of an SQL abstraction layer. For example, similar to things like [1][2][3] (though 3 is more of an ORM). If you like the SQL interface tha

Re: [VOTE] Move Arrow DataFusion Subproject to new Top Level Apache Project

2024-03-01 Thread Weston Pace
+1 (binding) On Fri, Mar 1, 2024 at 3:33 AM Andrew Lamb wrote: > Hello, > > As we have discussed[1][2] I would like to vote on the proposal to > create a new Apache Top Level Project for DataFusion. The text of the > proposed resolution and background document is copy/pasted below > > If the com

Re: [ANNOUNCE] New Arrow committer: Jay Zhan

2024-02-16 Thread Weston Pace
Congrats! On Fri, Feb 16, 2024 at 3:07 AM Raúl Cumplido wrote: > Congratulations!! > > El vie, 16 feb 2024 a las 12:02, Daniël Heres > () escribió: > > > > Congratulations! > > > > On Fri, Feb 16, 2024, 11:33 Metehan Yıldırım < > metehan.yildi...@synnada.ai> > > wrote: > > > > > Congrats! > > >

Re: [DISC] Improve Arrow Release verification process

2024-01-21 Thread Weston Pace
+1. There have been a few times I've attempted to run the verification scripts. They have failed, but I was pretty confident it was a problem with my environment mixing with the verification script and not a problem in the software itself and I didn't take the time to debug the verification scrip

Re: [DISCUSS] Semantics of extension types

2023-12-14 Thread Weston Pace
I agree engines can use their own strategy. Requiring explicit casts is probably ok as long as it is well documented but I think I lean slightly towards implicitly falling back to the storage type. I do think think people still shy away from extension types. Adding the extension type to an impli

Re: [VOTE] Flight SQL as experimental

2023-12-08 Thread Weston Pace
t; > The vote will be open for at least 72 hours. > > > > > > [ ] +1 > > > [ ] +0 > > > [ ] -1 Keep Flight SQL experimental because... > > > > > > On Fri, Dec 8, 2023, at 13:37, Weston Pace wrote: > > >> +1 > > &

Re: [DISCUSS] Flight SQL as experimental

2023-12-08 Thread Weston Pace
+1 On Fri, Dec 8, 2023 at 10:33 AM Micah Kornfield wrote: > +1 > > On Fri, Dec 8, 2023 at 10:29 AM Andrew Lamb wrote: > > > I agree it is time to "promote" ArrowFlightSQL to the same level as other > > standards in Arrow > > > > Now that it is used widely (we use and count on it too at InfluxDa

Re: [ANNOUNCE] New Arrow committer: Felipe Oliveira Carvalho

2023-12-07 Thread Weston Pace
Congratulations Felipe! On Thu, Dec 7, 2023 at 8:38 AM wish maple wrote: > Congrats Felipe!!! > > Best, > Xuwei Fu > > Benjamin Kietzman 于2023年12月7日周四 23:42写道: > > > On behalf of the Arrow PMC, I'm happy to announce that Felipe Oliveira > > Carvalho > > has accepted an invitation to become a co

Re: [ANNOUNCE] New Arrow PMC chair: Andy Grove

2023-11-27 Thread Weston Pace
Congrats Andy! On Mon, Nov 27, 2023, 7:31 PM wish maple wrote: > Congrats Andy! > > Best, > Xuwei Fu > > Andrew Lamb 于2023年11月27日周一 20:47写道: > > > I am pleased to announce that the Arrow Project has a new PMC chair and > VP > > as per our tradition of rotating the chair once a year. I have resi

Re: [ANNOUNCE] New Arrow committer: James Duong

2023-11-17 Thread Weston Pace
Congratulations James On Fri, Nov 17, 2023 at 6:07 AM Metehan Yıldırım < metehan.yildi...@synnada.ai> wrote: > Congratulations! > > On Thu, Nov 16, 2023 at 10:45 AM Sutou Kouhei wrote: > > > On behalf of the Arrow PMC, I'm happy to announce that James Duong > > has accepted an invitation to beco

Re: [ANNOUNCE] New Arrow PMC member: Raúl Cumplido

2023-11-13 Thread Weston Pace
Congratulations Raúl! On Mon, Nov 13, 2023 at 1:34 PM Ben Harkins wrote: > Congrats, Raúl!! > > On Mon, Nov 13, 2023 at 4:30 PM Bryce Mecum wrote: > > > Congrats, Raúl! > > > > On Mon, Nov 13, 2023 at 10:28 AM Andrew Lamb > > wrote: > > > > > > The Project Management Committee (PMC) for Apache

Re: [DISCUSS][Format] C data interface for Utf8View

2023-11-07 Thread Weston Pace
+1 for the original proposal as well. --- The (minor) problem I see with flags is that there isn't much point to this feature if you are gating on a flag. I'm assuming the goal is what Dewey originally mentioned which is making buffer calculations easier. However, if you're gating the feature w

Re: [DISCUSS][Format] C data interface for Utf8View

2023-10-26 Thread Weston Pace
Is this buffer lengths buffer only present if the array type is Utf8View? Or are you suggesting that other types might want to adopt this as well? On Thu, Oct 26, 2023 at 10:00 AM Dewey Dunnington wrote: > > I expect C code to not be much longer then this :-) > > nanoarrow's buffer-length-calcul

Re: [ANNOUNCE] New Arrow committer: Xuwei Fu

2023-10-23 Thread Weston Pace
Congratulations Xuwei! On Mon, Oct 23, 2023 at 3:38 AM wish maple wrote: > Thanks kou and every nice person in arrow community! > > I've learned a lot during learning and contribution to arrow and > parquet. Thanks for everyone's help. > Hope we can bring more fancy features in the future! > > B

Re: Apache Arrow file format

2023-10-21 Thread Weston Pace
> Of course, what I'm really asking for is to see how Lance would compare ;-) > P.S. The second paper [2] also talks about ML workloads (in Section 5.8) > and GPU performance (in Section 5.9). It also cites Lance as one of the > future formats (in Section 5.6.2). Disclaimer: I work for LanceDb an

Re: [ANNOUNCE] New Arrow PMC member: Jonathan Keane

2023-10-15 Thread Weston Pace
Congratulations Jon! On Sun, Oct 15, 2023, 1:51 PM Neal Richardson wrote: > Congratulations! > > On Sun, Oct 15, 2023 at 1:35 PM Bryce Mecum wrote: > > > Congratulations, Jon! > > > > On Sat, Oct 14, 2023 at 9:24 AM Andrew Lamb > wrote: > > > > > > The Project Management Committee (PMC) for Ap

Re: [ANNOUNCE] New Arrow committer: Curt Hagenlocher

2023-10-15 Thread Weston Pace
Congratulations! On Sun, Oct 15, 2023, 8:51 AM Gang Wu wrote: > Congrats! > > On Sun, Oct 15, 2023 at 10:49 PM David Li wrote: > > > Congrats & welcome Curt! > > > > On Sun, Oct 15, 2023, at 09:03, wish maple wrote: > > > Congratulations! > > > > > > Raúl Cumplido 于2023年10月15日周日 20:48写道: > > >

Re: [DISCUSS][C++] Raw pointer string views

2023-10-06 Thread Weston Pace
> I feel the broader question here is what is Arrow's intended use case - interchange or execution The line between interchange and execution is not always clear. For example, I think we would like Arrow to be considered as a standard for UDF libraries. On Fri, Oct 6, 2023 at 7:34 AM Mark Raasve

Re: [Discuss][C++] A framework for contextual/implicit/ambient vars

2023-08-24 Thread Weston Pace
In other languages I have seen this called "async local"[1][2][3]. I'm not sure of any C++ implementations. Folly's fibers claim to have fiber-local variables[4] but I can't find the actual code to use them. I can't seem to find reference to the concept in boost's asio or cppcoro. I've also see

Re: [VOTE][Format] Add Utf8View Arrays to Arrow Format

2023-08-21 Thread Weston Pace
+1 Thanks to all for the discussion and thanks to Ben for all of the great work. On Mon, Aug 21, 2023 at 9:16 AM wish maple wrote: > +1 (non-binding) > > It would help a lot when processing UTF-8 related data! > > Xuwei > > Andrew Lamb 于2023年8月22日周二 00:11写道: > > > +1 > > > > This is a great e

Re: Acero and Substrait: How to select struct field from a struct column?

2023-08-07 Thread Weston Pace
> But I can't figure out how to express "select struct field 0 from field 2 > of the original table where field 2 is a struct column" > > Any idea how the substrait message should look like for the above? I believe it would be: ``` "expression": { "selection": { "direct_reference": {

Re: [DISCUSS] Canonical alternative layout proposal

2023-08-02 Thread Weston Pace
> I would welcome a draft PR showcasing the changes necessary in the IPC > format definition, and in the C Data Interface specification (no need to > actually implement them for now :-)). I've proposed something at [1]. > One sketch of an idea: define sets of types that we can call “kinds”** > (e

Re: dataset write stucks on ThrottledAsyncTaskSchedulerImpl

2023-07-31 Thread Weston Pace
size to 6/8/16, The system works fine. CPU is about > > 100%. like 2.1.1 > > 2.2.2 for bucket_size to 32, the bug comes back. CPU halts at 550%. > > > > 2.3 io_thread_count to 8 > > 2.3.1 for bucket_size to 16, it fails somehow. After transferring > > done, the m

Re: dataset write stucks on ThrottledAsyncTaskSchedulerImpl

2023-07-28 Thread Weston Pace
goes up as well, to 800%. > 1. Sometimes, the writing queue can overcome, CPU will goes down after > the memory accumulated. The writing speed recoved and memory back to > normal. > 2. Sometimes, it can't. IOBPS goes down sharply, and CPU never goes > down after that. > > How

Re: dataset write stucks on ThrottledAsyncTaskSchedulerImpl

2023-07-27 Thread Weston Pace
You'll need to measure more but generally the bottleneck for writes is usually going to be the disk itself. Unfortunately, standard OS buffered I/O has some pretty negative behaviors in this case. First I'll describe what I generally see happen (the last time I profiled this was a while back but

Re: scheduler() and aync_scheduler() on QueryContext

2023-07-26 Thread Weston Pace
on! Very helpful explanation. > > On Tue, Jul 25, 2023 at 6:41 PM Weston Pace wrote: > > > 1) As a rule of thumb I would probably prefer `async_scheduler`. It's > more > > feature rich and simpler to use and is meant to handle "long running" > tasks > > (

Re: how to make acero output order by batch index

2023-07-26 Thread Weston Pace
above it is probably ok to assume an implicit ordering in many cases). On Wed, Jul 26, 2023 at 8:18 AM Weston Pace wrote: > > I think the key problem is that the input stream is unordered. The > > input stream is a ArrowArrayStream imported from python side, and then

Re: how to make acero output order by batch index

2023-07-26 Thread Weston Pace
rator. > > Also, I'd like to have a discuss on dataset scanner, is it produce a > > stable sequence of record batches (as an implicit ordering) when the > > underlying storage is not changed? For my situation, the downstream > > executor may crush, then it would request to con

Re: scheduler() and aync_scheduler() on QueryContext

2023-07-25 Thread Weston Pace
1) As a rule of thumb I would probably prefer `async_scheduler`. It's more feature rich and simpler to use and is meant to handle "long running" tasks (e.g. 10s-100s of ms or more). The scheduler is a bit more complex and is intended for very fine-grained scheduling. It's currently only used in

Re: how to make acero output order by batch index

2023-07-25 Thread Weston Pace
> Reading the source code of exec_plan.cc, DeclarationToReader called > DeclarationToRecordBatchGenerator, which ignores the sequence_output > parameter in SinkNodeOptions, also, it calls validate which should > fail if the SinkNodeOptions honors the sequence_output. Then it seems > that Declaratio

Re: hashing Arrow structures

2023-07-24 Thread Weston Pace
> Also, I don't understand why there are two versions of the hash table > ("hashing32" and "hashing64" apparently). What's the rationale? How is > the user meant to choose between them? Say a Substrait plan is being > executed: which hashing variant is chosen and why? It's not user-configurable.

Re: hashing Arrow structures

2023-07-21 Thread Weston Pace
Yes, those are the two main approaches to hashing in the code base that I am aware of as well. I haven't seen any real concrete comparison and benchmarks between the two. If collisions between NA and 0 are a problem it would probably be ok to tweak the hash value of NA to something unique. I susp

Re: Need help on ArrayaSpan and writing C++ udf

2023-07-17 Thread Weston Pace
> I may be missing something, but why copy to *out_values++ instead of > *out_values and add 32 to out_values afterwards? Otherwise I agree this is > the way to go. I agree with Jin. You should probably be incrementing `out` by 32 each time `VisitValue` is called. On Mon, Jul 17, 2023 at 6:38 AM

Re: [DISCUSS][Format] Draft implementation of string view array format

2023-07-11 Thread Weston Pace
f challenges are in the forefront. > I agree 100% that this sort of interoperability is what makes Arrow so > compelling and something we should work very hard to preserve. This is > the crux of my concern with standardising alternative layouts. I > definitely hope that with time Arrow will penetr

Re: Confusion on substrait AggregateRel::groupings and Arrow consumer

2023-07-10 Thread Weston Pace
Yes, that is correct. What Substrait calls "groupings" is what is often referred to in SQL as "grouping sets". These allow you to compute the same aggregates but group by different criteria. Two very common ways of creating grouping sets are "group by cube" and "group by rollup". Snowflake's do

Re: [DISCUSS][Format] Draft implementation of string view array format

2023-07-10 Thread Weston Pace
jor bottleneck so can't speak authoritatively here. > >> > >> Which leads on to my major concern with this proposal, that it adds > >> complexity and cognitive load to the specification and implementations, > >> whilst not meaningfully improving the performance

Re: Do we need CODEOWNERS ?

2023-07-04 Thread Weston Pace
I agree the experiment isn't working very well. I've been meaning to change my listing from `compute` to `acero` for a while. I'd be +1 for just removing it though. On Tue, Jul 4, 2023, 6:44 AM Dewey Dunnington wrote: > Just a note that for me, the main problem is that I get automatic > review

Re: [ANNOUNCE] New Arrow committer: Kevin Gurney

2023-07-03 Thread Weston Pace
Congratulations Kevin! On Mon, Jul 3, 2023 at 5:18 PM Sutou Kouhei wrote: > On behalf of the Arrow PMC, I'm happy to announce that Kevin Gurney > has accepted an invitation to become a committer on Apache > Arrow. Welcome, and thank you for your contributions! > > -- > kou >

Re: Question about large exec batch in acero

2023-07-03 Thread Weston Pace
> is this overflow considered a bug? Or is large exec batch something that should be avoided? This is not a bug and it is something that should be avoided. Some of the hash-join internals expect small batches. I actually thought the limit was 32Ki and not 64Ki because I think there may be some p

Re: Apache Arrow | Graph Algorithms & Data Structures

2023-06-29 Thread Weston Pace
Is your use case to operate on a batch of graphs? For example, do you have hundreds or thousands of graphs that you need to run these algorithms on at once? Or is your use case to operate on a single large graph? If it's the single-graph case then how many nodes do you have? If it's one graph a

Re: [C++] Dealing with third party method that raises exception

2023-06-29 Thread Weston Pace
We do this quite a bit in the Arrow<->Parquet bridge if IIUC. There are macros defined like this: ``` #define BEGIN_PARQUET_CATCH_EXCEPTIONS try { #define END_PARQUET_CATCH_EXCEPTIONS \ }\ catch (const ::parquet::ParquetStat

  1   2   3   4   5   >