+1 for Rust Best Regards, LuNing Wang
Nan Zhu <zhunanmcg...@gmail.com> 于2022年6月22日周三 14:15写道: > +1 for using rust as the backbone for new language bindings > > On Sun, Jun 12, 2022 at 23:52 OpenInx <open...@gmail.com> wrote: > >> Thanks Kyle for sharing your context. >> >> Recently, I also spent some time practicing my Rust skills. Generally, >> I'm +1 for adding Rust SDK support for native language. >> >> >> On Mon, Jun 13, 2022 at 12:51 PM Kyle Bendickson <k...@tabular.io> wrote: >> >>> Thanks for starting this discussion. >>> >>> I know I was the first to mention some of my concerns (which I still >>> have and would apply to any new major change), but I also think that this >>> is an avenue that should be explored. >>> >>> Specifically a native integration would have many benefits for >>> read paths (in addition to others). I know that the Rust avro reader is >>> significantly faster, as well as native columnar formats. >>> >>> So while I do have some concerns about making sure we have enough people >>> to support this endeavor, I do want to say I think it's a really good idea. >>> My apologies if I gave the impression otherwise. >>> >>> I would personally be interested in contributing to and reviewing for a >>> native Rust library (or CPP, but I think Rust is a much more elegant >>> language and I'd personally prefer to work in that as it's easier to work >>> with across systems than C++ imo though I would defer to others on that). >>> >>> I would also be happy to offer my help and perspective in moving this >>> forward if need be. But I did want to express my practical concerns so that >>> we don't have an area of the codebase where there aren't enough people to >>> help maintain it etc. >>> >>> But in general I think this is an exciting opportunity, and results have >>> shown time and time again that native readers / writers are much more >>> performant. >>> >>> +1 to using Rust as well (which is a language I know more of than C++ >>> these days - though both I'd have to brush off my skillset). >>> >>> Best, Kyle >>> >>> On Sun, Jun 12, 2022 at 8:20 PM OpenInx <open...@gmail.com> wrote: >>> >>>> Hi Tao Wu. >>>> >>>> I think the apache iceberg community is very consistent in providing >>>> the Iceberg SDK for native languages. I am very happy to offer my >>>> perspective and help if needed when you try to move this thing forward. >>>> >>>> On Mon, Jun 13, 2022 at 11:04 AM Wu Tao <wu...@apache.org> wrote: >>>> >>>>> Hi, everyone, I'm Tao. I'm currently working on a commercial streaming >>>>> system that is written in Rust. >>>>> >>>>> Actually, I'm planning to implement an Iceberg Rust SDK so that we can >>>>> have better integration with the existing Iceberg ecosystem. Initially I >>>>> found https://github.com/oliverdaff/iceberg-rs, but it appears the >>>>> author hasn't been active lately. So I'm looking to see if the Iceberg >>>>> community has any consensus on a Rust/C++ SDK (Rust is preferable), and if >>>>> there is, we'd love to contribute. I believe as Iceberg increases its >>>>> popularity, there will eventually be more systems that want such >>>>> libraries. >>>>> There could have even been some ongoing works without consulting with the >>>>> community. >>>>> >>>>> Additionally, I think the initial Rust/C++ SDK can only support the >>>>> reader&writer sides of Iceberg. Because there have been plenty of >>>>> JVM-based >>>>> query engines out there taking charge of data maintenance. We don't have >>>>> to >>>>> rewrite every corner of Iceberg in Rust. That means less engineering work. >>>>> >>>>> On 2022/06/08 10:16:05 OpenInx wrote: >>>>> > As a cloud-native table format standard for the big-data ecosystem, >>>>> I >>>>> > believe supporting multiple languages is the correct direction so >>>>> that >>>>> > different languages can connect to the apache iceberg table format. >>>>> > >>>>> > But I can also get Kyle's point about lacking enough >>>>> resources(developers >>>>> > and reviewers ) to accomplish this goal. In my mind, Python, >>>>> Golang, C++, >>>>> > Rust , all of them can be regarded as the native language support. >>>>> we may >>>>> > just need to support the Rust SDK and then all of the other >>>>> languages can >>>>> > just wrap the Rust SDK to access the table format. >>>>> > >>>>> > Anyway, we will need to wait for the REST catalog finished before we >>>>> > introduce another languages support , because we can not access the >>>>> iceberg >>>>> > table by invoking the JVM catalog interfaces. >>>>> > >>>>> > On Tue, Jun 7, 2022 at 4:41 AM Micah Kornfield < >>>>> emkornfi...@gmail.com> >>>>> > wrote: >>>>> > >>>>> > > There’s also the question of how useful this would be in practice >>>>> given >>>>> > >> the complexity of using C++ (or Rust etc) within some of the major >>>>> > >> frameworks. >>>>> > >> >>>>> > > >>>>> > > One place this would be useful is for the Arrow's DataSet API >>>>> [1]. An >>>>> > > option the Arrow community might be open to is hosting parts of >>>>> the code >>>>> > > there (this is what is done for Apache Parquet C++). This helps >>>>> shape some >>>>> > > of the answers to other questions posed (ORC and Parquet are >>>>> already in the >>>>> > > Repo, it provides a Filesystem interface, etc). The project >>>>> doesn't >>>>> > > currently consume Avro, and I think the preferred approach is to >>>>> make a >>>>> > > clean room Avro parser. But I agree this is a non-trivial effort >>>>> to get >>>>> > > underway. >>>>> > > >>>>> > > Another area to consider is compatibility testing. I think before >>>>> a third >>>>> > > officially supported community library is introduced it would be >>>>> good to >>>>> > > have a compatibility framework in place to make sure >>>>> implementations are >>>>> > > all interpreting the specification correctly. If there isn't >>>>> already an >>>>> > > effort here, I'd like to start contributing something (probably >>>>> will have >>>>> > > bandwidth sometime place in Q3). >>>>> > > >>>>> > > Thanks, >>>>> > > -Micah >>>>> > > >>>>> > > >>>>> > > [1] https://arrow.apache.org/docs/cpp/dataset.html >>>>> > > >>>>> > > On Sun, Jun 5, 2022 at 11:07 PM Kyle Bendickson <k...@tabular.io> >>>>> wrote: >>>>> > > >>>>> > >> Hi caneGuy, >>>>> > >> >>>>> > >> I personally don’t dislike this idea. I understand the performance >>>>> > >> benefits. >>>>> > >> >>>>> > >> But this would be a huge undertaking for the community. We’d need >>>>> to >>>>> > >> ensure we had sufficient developer support for reviews (likely >>>>> one of the >>>>> > >> biggest issues), as well as a number of other things. Particularly >>>>> > >> dependencies, package management, etc. We’d also need to scope >>>>> support down >>>>> > >> to specific OS / compilers etc. >>>>> > >> >>>>> > >> We’d also need to be sure we had adequate developer support from >>>>> a wide >>>>> > >> enough range of the community to support the project long term. >>>>> One issue >>>>> > >> in open source is that developers will work on something >>>>> tangential to >>>>> > >> their project in another repository, but nobody is available to >>>>> maintain it. >>>>> > >> >>>>> > >> There’s also the question of how useful this would be in practice >>>>> given >>>>> > >> the complexity of using C++ (or Rust etc) within some of the major >>>>> > >> frameworks. >>>>> > >> >>>>> > >> Again, I’m not opposed to the idea but just trying to be >>>>> realistic about >>>>> > >> the realities of such an undertaking. It would need full >>>>> community support >>>>> > >> (or at least support from enough community members to be >>>>> sustainable). >>>>> > >> >>>>> > >> If you wanted to make a design doc, the milestones tab in the >>>>> Iceberg >>>>> > >> project has some that you might use as reference. >>>>> > >> >>>>> > >> *I highly suggest you come to the next community sync and bring >>>>> this up >>>>> > >> to the community then.* >>>>> > >> >>>>> > >> If you’re not already on the invite list for the monthly >>>>> community sync, >>>>> > >> you can get on it by joining the Google group. You’ll receive >>>>> incites when >>>>> > >> they go out: >>>>> > >> https://groups.google.com/g/iceberg-sync >>>>> > >> >>>>> > >> Looking forward to seeing you at the next community sync. >>>>> > >> >>>>> > >> A design document and/or any prior art would be very helpful as >>>>> the >>>>> > >> community sync does discuss many topics (possibly there is >>>>> existing C++ >>>>> > >> support in StarRocks for Iceberg V1?). >>>>> > >> >>>>> > >> Thank you, >>>>> > >> Kyle Bendickson >>>>> > >> GitHub: kbendick >>>>> > >> >>>>> > >> On Sun, Jun 5, 2022 at 10:44 PM Sam Redai <s...@tabular.io> wrote: >>>>> > >> >>>>> > >>> Currently there is no existing effort to develop a C++ package. >>>>> That >>>>> > >>> being said I think it would be awesome to have one! If anyone is >>>>> willing to >>>>> > >>> start that development effort, I can help with some of the >>>>> ground work to >>>>> > >>> kickstart it. >>>>> > >>> >>>>> > >>> I would say the first step would be for someone to prepare a >>>>> high-level >>>>> > >>> proposal. >>>>> > >>> >>>>> > >>> -Sam >>>>> > >>> >>>>> > >>> On Sun, Jun 5, 2022 at 11:02 PM 周康 <zhoukang199...@gmail.com> >>>>> wrote: >>>>> > >>> >>>>> > >>>> Hi team >>>>> > >>>> I am a dev from StarRocks community, and we have supported >>>>> iceberg v1 >>>>> > >>>> format. >>>>> > >>>> We are also planning to support v2 format. If there is a C++ >>>>> package, >>>>> > >>>> it will be very convenient for our implementation. >>>>> > >>>> At the same time, other c++ computing engines support v2 format >>>>> will >>>>> > >>>> also be faster. >>>>> > >>>> >>>>> > >>>> Do we have plans to support c++ version sdk? >>>>> > >>>> -- >>>>> > >>>> caneGuy >>>>> > >>>> >>>>> > >>> -- >>>>> > >>> >>>>> > >>> Sam Redai <s...@tabular.io> >>>>> > >>> >>>>> > >>> Developer Advocate | Tabular <https://tabular.io/> >>>>> > >>> >>>>> > >>> c (267) 226-8606 >>>>> > >>> >>>>> > >> >>>>> > >>>>> >>>> >>> >>> -- >>> >>> Kyle Bendickson >>> >>> OSS Developer | Tabular <https://tabular.io/> >>> >>> k...@tabular.io >>> >>