Delta Lake has rust bindings which should in theory be linkable into native code.
Iceberg is actively developing a python library and there has been talk of Rust/native bindings. I'd like to see a C++ implementation that can be incorporated into Datasets but don't currently have bandwidth to work on it. The new REST based catalog specification in Iceberg should make integration outside JVM based ecosystems as it gains adoption (i.e. hopefully making HMS integration unnecessary). As Weston said I think at the Acero/Datasets level I think the right thing to focus on is abstractions that allow plugging in any storage subsystem. On Friday, September 9, 2022, Jayjeet Chakraborty < jayjeetchakrabort...@gmail.com> wrote: > Thanks a lot everyone for your comments. Sorry, I meant to say > adding transaction/update/append functionalities in the Dataset API, but > it seems like it would be a duplication of work as in Apache Iceberg. The > only problem with Iceberg/Delta Lake is that it is heavily locked into the > JVM ecosystem, making it difficult to integrate with backends with > C++-based storage interfaces. > > On Sat, Sep 10, 2022 at 1:39 AM Weston Pace <weston.p...@gmail.com> wrote: > >> I'd agree with Micah. I'm also not aware of anyone working on this. >> The docs clarify a bit more on the details[1]. I think we'd need a >> bit more thinking around an "update/append" workflow too. >> >> That being said, updates, transactions, and appends are something that >> the Iceberg project has thought a lot about. Rather than reinvent the >> wheel I think it'd be interesting to see if Acero could be used on the >> read path of an Iceberg workflow. I have not really planned out what >> that would look like in great detail and, at a minimum, you'd maybe >> want some kind of Iceberg -> Substrait planner. >> >> [1] https://arrow.apache.org/docs/python/dataset.html#a-note-on- >> transactions-acid-guarantees >> >> On Fri, Sep 9, 2022 at 12:06 PM Micah Kornfield <emkornfi...@gmail.com> >> wrote: >> > >> > I would think any transaction concerns would live at the peripheries? >> e.g. >> > the Datasets? Or at least that is where compatibility would have to be >> > built first. >> > >> > On Fri, Sep 9, 2022 at 12:01 PM Sasha Krassovsky < >> krassovskysa...@gmail.com> >> > wrote: >> > >> > > Hi Jayjeet, >> > > Transactions are currently out of scope for Acero - Acero is only >> meant to >> > > be a query execution engine. That said, it can definitely be used as a >> > > component for building a full database engine, which could implement >> its >> > > own locking of rows while Acero executes on them. You could also >> check out >> > > DuckDB, which can operate on Arrow data and also supports >> transactions. >> > > >> > > Sasha >> > > >> > > > 9 сент. 2022 г., в 11:54, Jayjeet Chakraborty < >> > > jayjeetchakrabort...@gmail.com> написал(а): >> > > > >> > > > Hi Arrow Community, >> > > > >> > > > Since Acero is developing very fast into a full fledged compute >> engine, >> > > are >> > > > there plans to add transaction semantics to acero, so that it can >> also be >> > > > used as a database layer over already supported storage backends ? >> What I >> > > > am referring to is like a Delta Lake/Iceberg kind of interface over >> Acero >> > > > in C++. Thanks. >> > > > >> > > > >> > > > -- >> > > > *Jayjeet Chakraborty* >> > > > CS PhD student >> > > > UC Santa Cruz >> > > > California, USA >> > > >> > > > -- > *Jayjeet Chakraborty* > CS PhD student > UC Santa Cruz > California, USA > >