+1 for Rust

Best Regards,
LuNing Wang

Nan Zhu <zhunanmcg...@gmail.com> 于2022年6月22日周三 14:15写道:

> +1 for using rust as the backbone for new language bindings
>
> On Sun, Jun 12, 2022 at 23:52 OpenInx <open...@gmail.com> wrote:
>
>> Thanks Kyle for sharing your context.
>>
>> Recently, I also spent some time practicing my Rust skills.  Generally,
>> I'm +1 for adding Rust SDK support for native language.
>>
>>
>> On Mon, Jun 13, 2022 at 12:51 PM Kyle Bendickson <k...@tabular.io> wrote:
>>
>>> Thanks for starting this discussion.
>>>
>>> I know I was the first to mention some of my concerns (which I still
>>> have and would apply to any new major change), but I also think that this
>>> is an avenue that should be explored.
>>>
>>> Specifically a native integration would have many benefits for
>>> read paths (in addition to others). I know that the Rust avro reader is
>>> significantly faster, as well as native columnar formats.
>>>
>>> So while I do have some concerns about making sure we have enough people
>>> to support this endeavor, I do want to say I think it's a really good idea.
>>> My apologies if I gave the impression otherwise.
>>>
>>> I would personally be interested in contributing to and reviewing for a
>>> native Rust library (or CPP, but I think Rust is a much more elegant
>>> language and I'd personally prefer to work in that as it's easier to work
>>> with across systems than C++ imo though I would defer to others on that).
>>>
>>> I would also be happy to offer my help and perspective in moving this
>>> forward if need be. But I did want to express my practical concerns so that
>>> we don't have an area of the codebase where there aren't enough people to
>>> help maintain it etc.
>>>
>>> But in general I think this is an exciting opportunity, and results have
>>> shown time and time again that native readers / writers are much more
>>> performant.
>>>
>>> +1 to using Rust as well (which is a language I know more of than C++
>>> these days - though both I'd have to brush off my skillset).
>>>
>>> Best, Kyle
>>>
>>> On Sun, Jun 12, 2022 at 8:20 PM OpenInx <open...@gmail.com> wrote:
>>>
>>>> Hi Tao Wu.
>>>>
>>>> I think the apache iceberg community is very consistent in providing
>>>> the Iceberg SDK for native languages.  I am very happy to offer my
>>>> perspective and help if needed when you try to move this thing forward.
>>>>
>>>> On Mon, Jun 13, 2022 at 11:04 AM Wu Tao <wu...@apache.org> wrote:
>>>>
>>>>> Hi, everyone, I'm Tao. I'm currently working on a commercial streaming
>>>>> system that is written in Rust.
>>>>>
>>>>> Actually, I'm planning to implement an Iceberg Rust SDK so that we can
>>>>> have better integration with the existing Iceberg ecosystem. Initially I
>>>>> found https://github.com/oliverdaff/iceberg-rs, but it appears the
>>>>> author hasn't been active lately. So I'm looking to see if the Iceberg
>>>>> community has any consensus on a Rust/C++ SDK (Rust is preferable), and if
>>>>> there is, we'd love to contribute. I believe as Iceberg increases its
>>>>> popularity, there will eventually be more systems that want such 
>>>>> libraries.
>>>>> There could have even been some ongoing works without consulting with the
>>>>> community.
>>>>>
>>>>> Additionally, I think the initial Rust/C++ SDK can only support the
>>>>> reader&writer sides of Iceberg. Because there have been plenty of 
>>>>> JVM-based
>>>>> query engines out there taking charge of data maintenance. We don't have 
>>>>> to
>>>>> rewrite every corner of Iceberg in Rust. That means less engineering work.
>>>>>
>>>>> On 2022/06/08 10:16:05 OpenInx wrote:
>>>>> > As a cloud-native table format standard for the big-data ecosystem,
>>>>> I
>>>>> > believe supporting multiple languages is the correct direction so
>>>>> that
>>>>> > different languages can connect to the apache iceberg table format.
>>>>> >
>>>>> > But I can also get Kyle's point about lacking enough
>>>>> resources(developers
>>>>> > and reviewers ) to accomplish this goal.  In my mind,  Python,
>>>>> Golang, C++,
>>>>> > Rust , all of them can be regarded as the native language support.
>>>>> we may
>>>>> > just need to support the Rust SDK and then all of the other
>>>>> languages can
>>>>> > just wrap the Rust SDK to access the table format.
>>>>> >
>>>>> > Anyway,  we will need to wait for the REST catalog finished before we
>>>>> > introduce another languages support , because we can not access the
>>>>> iceberg
>>>>> > table by invoking the JVM catalog interfaces.
>>>>> >
>>>>> > On Tue, Jun 7, 2022 at 4:41 AM Micah Kornfield <
>>>>> emkornfi...@gmail.com>
>>>>> > wrote:
>>>>> >
>>>>> > > There’s also the question of how useful this would be in practice
>>>>> given
>>>>> > >> the complexity of using C++ (or Rust etc) within some of the major
>>>>> > >> frameworks.
>>>>> > >>
>>>>> > >
>>>>> > > One place this would be useful is for the Arrow's DataSet API
>>>>> [1].  An
>>>>> > > option the Arrow community might be open to is hosting parts of
>>>>> the code
>>>>> > > there (this is what is done for Apache Parquet C++).  This helps
>>>>> shape some
>>>>> > > of the answers to other questions posed (ORC and Parquet are
>>>>> already in the
>>>>> > > Repo, it provides a Filesystem interface, etc).  The project
>>>>> doesn't
>>>>> > > currently consume Avro, and I think the preferred approach is to
>>>>> make a
>>>>> > > clean room Avro parser.  But I agree this is a non-trivial effort
>>>>> to get
>>>>> > > underway.
>>>>> > >
>>>>> > > Another area to consider is compatibility testing.  I think before
>>>>> a third
>>>>> > > officially supported community library is introduced it would be
>>>>> good to
>>>>> > > have a compatibility framework in place to make sure
>>>>> implementations are
>>>>> > > all interpreting the specification correctly.  If there isn't
>>>>> already an
>>>>> > > effort here, I'd like to start contributing something (probably
>>>>> will have
>>>>> > > bandwidth sometime place in Q3).
>>>>> > >
>>>>> > > Thanks,
>>>>> > > -Micah
>>>>> > >
>>>>> > >
>>>>> > > [1] https://arrow.apache.org/docs/cpp/dataset.html
>>>>> > >
>>>>> > > On Sun, Jun 5, 2022 at 11:07 PM Kyle Bendickson <k...@tabular.io>
>>>>> wrote:
>>>>> > >
>>>>> > >> Hi caneGuy,
>>>>> > >>
>>>>> > >> I personally don’t dislike this idea. I understand the performance
>>>>> > >> benefits.
>>>>> > >>
>>>>> > >> But this would be a huge undertaking for the community. We’d need
>>>>> to
>>>>> > >> ensure we had sufficient developer support for reviews (likely
>>>>> one of the
>>>>> > >> biggest issues), as well as a number of other things. Particularly
>>>>> > >> dependencies, package management, etc. We’d also need to scope
>>>>> support down
>>>>> > >> to specific OS / compilers etc.
>>>>> > >>
>>>>> > >> We’d also need to be sure we had adequate developer support from
>>>>> a wide
>>>>> > >> enough range of the community to support the project long term.
>>>>> One issue
>>>>> > >> in open source is that developers will work on something
>>>>> tangential to
>>>>> > >> their project in another repository, but nobody is available to
>>>>> maintain it.
>>>>> > >>
>>>>> > >> There’s also the question of how useful this would be in practice
>>>>> given
>>>>> > >> the complexity of using C++ (or Rust etc) within some of the major
>>>>> > >> frameworks.
>>>>> > >>
>>>>> > >> Again, I’m not opposed to the idea but just trying to be
>>>>> realistic about
>>>>> > >> the realities of such an undertaking. It would need full
>>>>> community support
>>>>> > >> (or at least support from enough community members to be
>>>>> sustainable).
>>>>> > >>
>>>>> > >> If you wanted to make a design doc, the milestones tab in the
>>>>> Iceberg
>>>>> > >> project has some that you might use as reference.
>>>>> > >>
>>>>> > >> *I highly suggest you come to the next community sync and bring
>>>>> this up
>>>>> > >> to the community then.*
>>>>> > >>
>>>>> > >> If you’re not already on the invite list for the monthly
>>>>> community sync,
>>>>> > >> you can get on it by joining the Google group. You’ll receive
>>>>> incites when
>>>>> > >> they go out:
>>>>> > >> https://groups.google.com/g/iceberg-sync
>>>>> > >>
>>>>> > >> Looking forward to seeing you at the next community sync.
>>>>> > >>
>>>>> > >> A design document and/or any prior art would be very helpful as
>>>>> the
>>>>> > >> community sync does discuss many topics (possibly there is
>>>>> existing C++
>>>>> > >> support in StarRocks for Iceberg V1?).
>>>>> > >>
>>>>> > >> Thank you,
>>>>> > >> Kyle Bendickson
>>>>> > >> GitHub: kbendick
>>>>> > >>
>>>>> > >> On Sun, Jun 5, 2022 at 10:44 PM Sam Redai <s...@tabular.io> wrote:
>>>>> > >>
>>>>> > >>> Currently there is no existing effort to develop a C++ package.
>>>>> That
>>>>> > >>> being said I think it would be awesome to have one! If anyone is
>>>>> willing to
>>>>> > >>> start that development effort, I can help with some of the
>>>>> ground work to
>>>>> > >>> kickstart it.
>>>>> > >>>
>>>>> > >>> I would say the first step would be for someone to prepare a
>>>>> high-level
>>>>> > >>> proposal.
>>>>> > >>>
>>>>> > >>> -Sam
>>>>> > >>>
>>>>> > >>> On Sun, Jun 5, 2022 at 11:02 PM 周康 <zhoukang199...@gmail.com>
>>>>> wrote:
>>>>> > >>>
>>>>> > >>>> Hi team
>>>>> > >>>> I am a dev from StarRocks community, and we have supported
>>>>> iceberg v1
>>>>> > >>>> format.
>>>>> > >>>> We are also planning to support v2 format. If there is a C++
>>>>> package,
>>>>> > >>>> it will be very convenient for our implementation.
>>>>> > >>>> At the same time, other c++ computing engines support v2 format
>>>>> will
>>>>> > >>>> also be faster.
>>>>> > >>>>
>>>>> > >>>> Do we have plans to support c++ version sdk?
>>>>> > >>>> --
>>>>> > >>>> caneGuy
>>>>> > >>>>
>>>>> > >>> --
>>>>> > >>>
>>>>> > >>> Sam Redai <s...@tabular.io>
>>>>> > >>>
>>>>> > >>> Developer Advocate  |  Tabular <https://tabular.io/>
>>>>> > >>>
>>>>> > >>> c (267) 226-8606
>>>>> > >>>
>>>>> > >>
>>>>> >
>>>>>
>>>>
>>>
>>> --
>>>
>>> Kyle Bendickson
>>>
>>> OSS Developer  |  Tabular <https://tabular.io/>
>>>
>>> k...@tabular.io
>>>
>>

Reply via email to