Thanks a lot, Micah

On Sun, Sep 11, 2022 at 10:11 PM Micah Kornfield <emkornfi...@gmail.com>
wrote:

> Delta Lake has rust bindings which should in theory be linkable into
> native code.
>
> Iceberg is actively developing a python library and there has been talk of
> Rust/native bindings.   I'd like to see a C++ implementation that can be
> incorporated into Datasets but don't currently have bandwidth to work on
> it.  The new REST based catalog specification in Iceberg should make
> integration outside JVM based ecosystems as it gains adoption (i.e.
> hopefully making HMS integration unnecessary).
>
> As Weston said I think at the Acero/Datasets level I think the right thing
> to focus on is abstractions that allow plugging in any storage subsystem.
>
> On Friday, September 9, 2022, Jayjeet Chakraborty <
> jayjeetchakrabort...@gmail.com> wrote:
>
>> Thanks a lot everyone for your comments. Sorry, I meant to say
>> adding transaction/update/append functionalities in the Dataset API, but it
>> seems like it would be a duplication of work as in Apache Iceberg. The only
>> problem with Iceberg/Delta Lake is that it is heavily locked into the JVM
>> ecosystem, making it difficult to integrate with backends with C++-based
>> storage interfaces.
>>
>> On Sat, Sep 10, 2022 at 1:39 AM Weston Pace <weston.p...@gmail.com>
>> wrote:
>>
>>> I'd agree with Micah.  I'm also not aware of anyone working on this.
>>> The docs clarify a bit more on the details[1].  I think we'd need a
>>> bit more thinking around an "update/append" workflow too.
>>>
>>> That being said, updates, transactions, and appends are something that
>>> the Iceberg project has thought a lot about.  Rather than reinvent the
>>> wheel I think it'd be interesting to see if Acero could be used on the
>>> read path of an Iceberg workflow.  I have not really planned out what
>>> that would look like in great detail and, at a minimum, you'd maybe
>>> want some kind of Iceberg -> Substrait planner.
>>>
>>> [1]
>>> https://arrow.apache.org/docs/python/dataset.html#a-note-on-transactions-acid-guarantees
>>>
>>> On Fri, Sep 9, 2022 at 12:06 PM Micah Kornfield <emkornfi...@gmail.com>
>>> wrote:
>>> >
>>> > I would think any transaction concerns would live at the peripheries?
>>> e.g.
>>> > the Datasets?  Or at least that is where compatibility would have to be
>>> > built first.
>>> >
>>> > On Fri, Sep 9, 2022 at 12:01 PM Sasha Krassovsky <
>>> krassovskysa...@gmail.com>
>>> > wrote:
>>> >
>>> > > Hi Jayjeet,
>>> > > Transactions are currently out of scope for Acero - Acero is only
>>> meant to
>>> > > be a query execution engine. That said, it can definitely be used as
>>> a
>>> > > component for building a full database engine, which could implement
>>> its
>>> > > own locking of rows while Acero executes on them. You could also
>>> check out
>>> > > DuckDB, which can operate on Arrow data and also supports
>>> transactions.
>>> > >
>>> > > Sasha
>>> > >
>>> > > > 9 сент. 2022 г., в 11:54, Jayjeet Chakraborty <
>>> > > jayjeetchakrabort...@gmail.com> написал(а):
>>> > > >
>>> > > > Hi Arrow Community,
>>> > > >
>>> > > > Since Acero is developing very fast into a full fledged compute
>>> engine,
>>> > > are
>>> > > > there plans to add transaction semantics to acero, so that it can
>>> also be
>>> > > > used as a database layer over already supported storage backends ?
>>> What I
>>> > > > am referring to is like a Delta Lake/Iceberg kind of interface
>>> over Acero
>>> > > > in C++. Thanks.
>>> > > >
>>> > > >
>>> > > > --
>>> > > > *Jayjeet Chakraborty*
>>> > > > CS PhD student
>>> > > > UC Santa Cruz
>>> > > > California, USA
>>> > >
>>>
>>
>>
>> --
>> *Jayjeet Chakraborty*
>> CS PhD student
>> UC Santa Cruz
>> California, USA
>>
>>

-- 
*Jayjeet Chakraborty*
CS PhD student
UC Santa Cruz
California, USA

Reply via email to