Thanks a lot, Micah On Sun, Sep 11, 2022 at 10:11 PM Micah Kornfield <emkornfi...@gmail.com> wrote:
> Delta Lake has rust bindings which should in theory be linkable into > native code. > > Iceberg is actively developing a python library and there has been talk of > Rust/native bindings. I'd like to see a C++ implementation that can be > incorporated into Datasets but don't currently have bandwidth to work on > it. The new REST based catalog specification in Iceberg should make > integration outside JVM based ecosystems as it gains adoption (i.e. > hopefully making HMS integration unnecessary). > > As Weston said I think at the Acero/Datasets level I think the right thing > to focus on is abstractions that allow plugging in any storage subsystem. > > On Friday, September 9, 2022, Jayjeet Chakraborty < > jayjeetchakrabort...@gmail.com> wrote: > >> Thanks a lot everyone for your comments. Sorry, I meant to say >> adding transaction/update/append functionalities in the Dataset API, but it >> seems like it would be a duplication of work as in Apache Iceberg. The only >> problem with Iceberg/Delta Lake is that it is heavily locked into the JVM >> ecosystem, making it difficult to integrate with backends with C++-based >> storage interfaces. >> >> On Sat, Sep 10, 2022 at 1:39 AM Weston Pace <weston.p...@gmail.com> >> wrote: >> >>> I'd agree with Micah. I'm also not aware of anyone working on this. >>> The docs clarify a bit more on the details[1]. I think we'd need a >>> bit more thinking around an "update/append" workflow too. >>> >>> That being said, updates, transactions, and appends are something that >>> the Iceberg project has thought a lot about. Rather than reinvent the >>> wheel I think it'd be interesting to see if Acero could be used on the >>> read path of an Iceberg workflow. I have not really planned out what >>> that would look like in great detail and, at a minimum, you'd maybe >>> want some kind of Iceberg -> Substrait planner. >>> >>> [1] >>> https://arrow.apache.org/docs/python/dataset.html#a-note-on-transactions-acid-guarantees >>> >>> On Fri, Sep 9, 2022 at 12:06 PM Micah Kornfield <emkornfi...@gmail.com> >>> wrote: >>> > >>> > I would think any transaction concerns would live at the peripheries? >>> e.g. >>> > the Datasets? Or at least that is where compatibility would have to be >>> > built first. >>> > >>> > On Fri, Sep 9, 2022 at 12:01 PM Sasha Krassovsky < >>> krassovskysa...@gmail.com> >>> > wrote: >>> > >>> > > Hi Jayjeet, >>> > > Transactions are currently out of scope for Acero - Acero is only >>> meant to >>> > > be a query execution engine. That said, it can definitely be used as >>> a >>> > > component for building a full database engine, which could implement >>> its >>> > > own locking of rows while Acero executes on them. You could also >>> check out >>> > > DuckDB, which can operate on Arrow data and also supports >>> transactions. >>> > > >>> > > Sasha >>> > > >>> > > > 9 сент. 2022 г., в 11:54, Jayjeet Chakraborty < >>> > > jayjeetchakrabort...@gmail.com> написал(а): >>> > > > >>> > > > Hi Arrow Community, >>> > > > >>> > > > Since Acero is developing very fast into a full fledged compute >>> engine, >>> > > are >>> > > > there plans to add transaction semantics to acero, so that it can >>> also be >>> > > > used as a database layer over already supported storage backends ? >>> What I >>> > > > am referring to is like a Delta Lake/Iceberg kind of interface >>> over Acero >>> > > > in C++. Thanks. >>> > > > >>> > > > >>> > > > -- >>> > > > *Jayjeet Chakraborty* >>> > > > CS PhD student >>> > > > UC Santa Cruz >>> > > > California, USA >>> > > >>> >> >> >> -- >> *Jayjeet Chakraborty* >> CS PhD student >> UC Santa Cruz >> California, USA >> >> -- *Jayjeet Chakraborty* CS PhD student UC Santa Cruz California, USA