Thanks a lot everyone for your comments. Sorry, I meant to say
adding transaction/update/append functionalities in the Dataset API, but it
seems like it would be a duplication of work as in Apache Iceberg. The only
problem with Iceberg/Delta Lake is that it is heavily locked into the JVM
ecosystem,
Breaking changes should be documented in the release notes which are
announced on the Arrow blog[1][2]. In addition, in pyarrow, changes
to non-experimental APIs (and often also those made to experimental
APIs) should go through a deprecation cycle where a warning is emitted
for at least one relea
+1 (binding)
On Thu, Sep 8, 2022 at 9:12 PM Jacques Nadeau wrote:
>
> My vote continues to be +1
>
> On Thu, Sep 8, 2022 at 11:44 AM Neal Richardson
> wrote:
>
> > +1
> >
> > Neal
> >
> > On Thu, Sep 8, 2022 at 2:15 PM Ashish wrote:
> >
> > > +1 (non-binding)
> > >
> > > On Thu, Sep 8, 2022 at
I'd agree with Micah. I'm also not aware of anyone working on this.
The docs clarify a bit more on the details[1]. I think we'd need a
bit more thinking around an "update/append" workflow too.
That being said, updates, transactions, and appends are something that
the Iceberg project has thought
I would think any transaction concerns would live at the peripheries? e.g.
the Datasets? Or at least that is where compatibility would have to be
built first.
On Fri, Sep 9, 2022 at 12:01 PM Sasha Krassovsky
wrote:
> Hi Jayjeet,
> Transactions are currently out of scope for Acero - Acero is on
Hi Jayjeet,
Transactions are currently out of scope for Acero - Acero is only meant to be a
query execution engine. That said, it can definitely be used as a component for
building a full database engine, which could implement its own locking of rows
while Acero executes on them. You could also
Hi Arrow Community,
Since Acero is developing very fast into a full fledged compute engine, are
there plans to add transaction semantics to acero, so that it can also be
used as a database layer over already supported storage backends ? What I
am referring to is like a Delta Lake/Iceberg kind of i
After digging the code a bit, it looks like:
(1) pyarrow.read_schema should be changed to pyarrow.ipc.read_schema
(2) chunksize should be changed to max_chunksize (it was passed in as a
generic kwargs before and I am guessing it was a wrong in the first place)
These seem to be easy enough to fix b
Hi,
I am trying to update Pyarrow from 7.0 to 9.0 and hit a couple of issues
that I believe are because of some API changes. In particular, two issues I
saw seems to be
(1) pyarrow.read_schema is removed
(2) pa.Table.to_batches no longer takes a keyword argument (chunksize)
What's the best way t