Thanks Kyle for sharing your context. Recently, I also spent some time practicing my Rust skills. Generally, I'm +1 for adding Rust SDK support for native language.
On Mon, Jun 13, 2022 at 12:51 PM Kyle Bendickson <k...@tabular.io> wrote: > Thanks for starting this discussion. > > I know I was the first to mention some of my concerns (which I still have > and would apply to any new major change), but I also think that this is an > avenue that should be explored. > > Specifically a native integration would have many benefits for read paths > (in addition to others). I know that the Rust avro reader is > significantly faster, as well as native columnar formats. > > So while I do have some concerns about making sure we have enough people > to support this endeavor, I do want to say I think it's a really good idea. > My apologies if I gave the impression otherwise. > > I would personally be interested in contributing to and reviewing for a > native Rust library (or CPP, but I think Rust is a much more elegant > language and I'd personally prefer to work in that as it's easier to work > with across systems than C++ imo though I would defer to others on that). > > I would also be happy to offer my help and perspective in moving this > forward if need be. But I did want to express my practical concerns so that > we don't have an area of the codebase where there aren't enough people to > help maintain it etc. > > But in general I think this is an exciting opportunity, and results have > shown time and time again that native readers / writers are much more > performant. > > +1 to using Rust as well (which is a language I know more of than C++ > these days - though both I'd have to brush off my skillset). > > Best, Kyle > > On Sun, Jun 12, 2022 at 8:20 PM OpenInx <open...@gmail.com> wrote: > >> Hi Tao Wu. >> >> I think the apache iceberg community is very consistent in providing the >> Iceberg SDK for native languages. I am very happy to offer my perspective >> and help if needed when you try to move this thing forward. >> >> On Mon, Jun 13, 2022 at 11:04 AM Wu Tao <wu...@apache.org> wrote: >> >>> Hi, everyone, I'm Tao. I'm currently working on a commercial streaming >>> system that is written in Rust. >>> >>> Actually, I'm planning to implement an Iceberg Rust SDK so that we can >>> have better integration with the existing Iceberg ecosystem. Initially I >>> found https://github.com/oliverdaff/iceberg-rs, but it appears the >>> author hasn't been active lately. So I'm looking to see if the Iceberg >>> community has any consensus on a Rust/C++ SDK (Rust is preferable), and if >>> there is, we'd love to contribute. I believe as Iceberg increases its >>> popularity, there will eventually be more systems that want such libraries. >>> There could have even been some ongoing works without consulting with the >>> community. >>> >>> Additionally, I think the initial Rust/C++ SDK can only support the >>> reader&writer sides of Iceberg. Because there have been plenty of JVM-based >>> query engines out there taking charge of data maintenance. We don't have to >>> rewrite every corner of Iceberg in Rust. That means less engineering work. >>> >>> On 2022/06/08 10:16:05 OpenInx wrote: >>> > As a cloud-native table format standard for the big-data ecosystem, I >>> > believe supporting multiple languages is the correct direction so that >>> > different languages can connect to the apache iceberg table format. >>> > >>> > But I can also get Kyle's point about lacking enough >>> resources(developers >>> > and reviewers ) to accomplish this goal. In my mind, Python, Golang, >>> C++, >>> > Rust , all of them can be regarded as the native language support. we >>> may >>> > just need to support the Rust SDK and then all of the other languages >>> can >>> > just wrap the Rust SDK to access the table format. >>> > >>> > Anyway, we will need to wait for the REST catalog finished before we >>> > introduce another languages support , because we can not access the >>> iceberg >>> > table by invoking the JVM catalog interfaces. >>> > >>> > On Tue, Jun 7, 2022 at 4:41 AM Micah Kornfield <emkornfi...@gmail.com> >>> > wrote: >>> > >>> > > There’s also the question of how useful this would be in practice >>> given >>> > >> the complexity of using C++ (or Rust etc) within some of the major >>> > >> frameworks. >>> > >> >>> > > >>> > > One place this would be useful is for the Arrow's DataSet API [1]. >>> An >>> > > option the Arrow community might be open to is hosting parts of the >>> code >>> > > there (this is what is done for Apache Parquet C++). This helps >>> shape some >>> > > of the answers to other questions posed (ORC and Parquet are already >>> in the >>> > > Repo, it provides a Filesystem interface, etc). The project doesn't >>> > > currently consume Avro, and I think the preferred approach is to >>> make a >>> > > clean room Avro parser. But I agree this is a non-trivial effort to >>> get >>> > > underway. >>> > > >>> > > Another area to consider is compatibility testing. I think before a >>> third >>> > > officially supported community library is introduced it would be >>> good to >>> > > have a compatibility framework in place to make sure implementations >>> are >>> > > all interpreting the specification correctly. If there isn't >>> already an >>> > > effort here, I'd like to start contributing something (probably will >>> have >>> > > bandwidth sometime place in Q3). >>> > > >>> > > Thanks, >>> > > -Micah >>> > > >>> > > >>> > > [1] https://arrow.apache.org/docs/cpp/dataset.html >>> > > >>> > > On Sun, Jun 5, 2022 at 11:07 PM Kyle Bendickson <k...@tabular.io> >>> wrote: >>> > > >>> > >> Hi caneGuy, >>> > >> >>> > >> I personally don’t dislike this idea. I understand the performance >>> > >> benefits. >>> > >> >>> > >> But this would be a huge undertaking for the community. We’d need to >>> > >> ensure we had sufficient developer support for reviews (likely one >>> of the >>> > >> biggest issues), as well as a number of other things. Particularly >>> > >> dependencies, package management, etc. We’d also need to scope >>> support down >>> > >> to specific OS / compilers etc. >>> > >> >>> > >> We’d also need to be sure we had adequate developer support from a >>> wide >>> > >> enough range of the community to support the project long term. One >>> issue >>> > >> in open source is that developers will work on something tangential >>> to >>> > >> their project in another repository, but nobody is available to >>> maintain it. >>> > >> >>> > >> There’s also the question of how useful this would be in practice >>> given >>> > >> the complexity of using C++ (or Rust etc) within some of the major >>> > >> frameworks. >>> > >> >>> > >> Again, I’m not opposed to the idea but just trying to be realistic >>> about >>> > >> the realities of such an undertaking. It would need full community >>> support >>> > >> (or at least support from enough community members to be >>> sustainable). >>> > >> >>> > >> If you wanted to make a design doc, the milestones tab in the >>> Iceberg >>> > >> project has some that you might use as reference. >>> > >> >>> > >> *I highly suggest you come to the next community sync and bring >>> this up >>> > >> to the community then.* >>> > >> >>> > >> If you’re not already on the invite list for the monthly community >>> sync, >>> > >> you can get on it by joining the Google group. You’ll receive >>> incites when >>> > >> they go out: >>> > >> https://groups.google.com/g/iceberg-sync >>> > >> >>> > >> Looking forward to seeing you at the next community sync. >>> > >> >>> > >> A design document and/or any prior art would be very helpful as the >>> > >> community sync does discuss many topics (possibly there is existing >>> C++ >>> > >> support in StarRocks for Iceberg V1?). >>> > >> >>> > >> Thank you, >>> > >> Kyle Bendickson >>> > >> GitHub: kbendick >>> > >> >>> > >> On Sun, Jun 5, 2022 at 10:44 PM Sam Redai <s...@tabular.io> wrote: >>> > >> >>> > >>> Currently there is no existing effort to develop a C++ package. >>> That >>> > >>> being said I think it would be awesome to have one! If anyone is >>> willing to >>> > >>> start that development effort, I can help with some of the ground >>> work to >>> > >>> kickstart it. >>> > >>> >>> > >>> I would say the first step would be for someone to prepare a >>> high-level >>> > >>> proposal. >>> > >>> >>> > >>> -Sam >>> > >>> >>> > >>> On Sun, Jun 5, 2022 at 11:02 PM 周康 <zhoukang199...@gmail.com> >>> wrote: >>> > >>> >>> > >>>> Hi team >>> > >>>> I am a dev from StarRocks community, and we have supported >>> iceberg v1 >>> > >>>> format. >>> > >>>> We are also planning to support v2 format. If there is a C++ >>> package, >>> > >>>> it will be very convenient for our implementation. >>> > >>>> At the same time, other c++ computing engines support v2 format >>> will >>> > >>>> also be faster. >>> > >>>> >>> > >>>> Do we have plans to support c++ version sdk? >>> > >>>> -- >>> > >>>> caneGuy >>> > >>>> >>> > >>> -- >>> > >>> >>> > >>> Sam Redai <s...@tabular.io> >>> > >>> >>> > >>> Developer Advocate | Tabular <https://tabular.io/> >>> > >>> >>> > >>> c (267) 226-8606 >>> > >>> >>> > >> >>> > >>> >> > > -- > > Kyle Bendickson > > OSS Developer | Tabular <https://tabular.io/> > > k...@tabular.io >