Re: Transactional semantics in Acero

2022-09-09 Thread Jayjeet Chakraborty
Thanks a lot everyone for your comments. Sorry, I meant to say adding transaction/update/append functionalities in the Dataset API, but it seems like it would be a duplication of work as in Apache Iceberg. The only problem with Iceberg/Delta Lake is that it is heavily locked into the JVM ecosystem,

Re: Question on handling API changes when upgrading Pyarrow

2022-09-09 Thread Weston Pace
Breaking changes should be documented in the release notes which are announced on the Arrow blog[1][2]. In addition, in pyarrow, changes to non-experimental APIs (and often also those made to experimental APIs) should go through a deprecation cycle where a warning is emitted for at least one relea

Re: [VOTE] Substrait for Flight SQL

2022-09-09 Thread Wes McKinney
+1 (binding) On Thu, Sep 8, 2022 at 9:12 PM Jacques Nadeau wrote: > > My vote continues to be +1 > > On Thu, Sep 8, 2022 at 11:44 AM Neal Richardson > wrote: > > > +1 > > > > Neal > > > > On Thu, Sep 8, 2022 at 2:15 PM Ashish wrote: > > > > > +1 (non-binding) > > > > > > On Thu, Sep 8, 2022 at

Re: Transactional semantics in Acero

2022-09-09 Thread Weston Pace
I'd agree with Micah. I'm also not aware of anyone working on this. The docs clarify a bit more on the details[1]. I think we'd need a bit more thinking around an "update/append" workflow too. That being said, updates, transactions, and appends are something that the Iceberg project has thought

Re: Transactional semantics in Acero

2022-09-09 Thread Micah Kornfield
I would think any transaction concerns would live at the peripheries? e.g. the Datasets? Or at least that is where compatibility would have to be built first. On Fri, Sep 9, 2022 at 12:01 PM Sasha Krassovsky wrote: > Hi Jayjeet, > Transactions are currently out of scope for Acero - Acero is on

Re: Transactional semantics in Acero

2022-09-09 Thread Sasha Krassovsky
Hi Jayjeet, Transactions are currently out of scope for Acero - Acero is only meant to be a query execution engine. That said, it can definitely be used as a component for building a full database engine, which could implement its own locking of rows while Acero executes on them. You could also

Transactional semantics in Acero

2022-09-09 Thread Jayjeet Chakraborty
Hi Arrow Community, Since Acero is developing very fast into a full fledged compute engine, are there plans to add transaction semantics to acero, so that it can also be used as a database layer over already supported storage backends ? What I am referring to is like a Delta Lake/Iceberg kind of i

Re: Question on handling API changes when upgrading Pyarrow

2022-09-09 Thread Li Jin
After digging the code a bit, it looks like: (1) pyarrow.read_schema should be changed to pyarrow.ipc.read_schema (2) chunksize should be changed to max_chunksize (it was passed in as a generic kwargs before and I am guessing it was a wrong in the first place) These seem to be easy enough to fix b

Question on handling API changes when upgrading Pyarrow

2022-09-09 Thread Li Jin
Hi, I am trying to update Pyarrow from 7.0 to 9.0 and hit a couple of issues that I believe are because of some API changes. In particular, two issues I saw seems to be (1) pyarrow.read_schema is removed (2) pa.Table.to_batches no longer takes a keyword argument (chunksize) What's the best way t