Hi Tao Wu. I think the apache iceberg community is very consistent in providing the Iceberg SDK for native languages. I am very happy to offer my perspective and help if needed when you try to move this thing forward.
On Mon, Jun 13, 2022 at 11:04 AM Wu Tao <wu...@apache.org> wrote: > Hi, everyone, I'm Tao. I'm currently working on a commercial streaming > system that is written in Rust. > > Actually, I'm planning to implement an Iceberg Rust SDK so that we can > have better integration with the existing Iceberg ecosystem. Initially I > found https://github.com/oliverdaff/iceberg-rs, but it appears the author > hasn't been active lately. So I'm looking to see if the Iceberg community > has any consensus on a Rust/C++ SDK (Rust is preferable), and if there is, > we'd love to contribute. I believe as Iceberg increases its popularity, > there will eventually be more systems that want such libraries. There could > have even been some ongoing works without consulting with the community. > > Additionally, I think the initial Rust/C++ SDK can only support the > reader&writer sides of Iceberg. Because there have been plenty of JVM-based > query engines out there taking charge of data maintenance. We don't have to > rewrite every corner of Iceberg in Rust. That means less engineering work. > > On 2022/06/08 10:16:05 OpenInx wrote: > > As a cloud-native table format standard for the big-data ecosystem, I > > believe supporting multiple languages is the correct direction so that > > different languages can connect to the apache iceberg table format. > > > > But I can also get Kyle's point about lacking enough resources(developers > > and reviewers ) to accomplish this goal. In my mind, Python, Golang, > C++, > > Rust , all of them can be regarded as the native language support. we > may > > just need to support the Rust SDK and then all of the other languages can > > just wrap the Rust SDK to access the table format. > > > > Anyway, we will need to wait for the REST catalog finished before we > > introduce another languages support , because we can not access the > iceberg > > table by invoking the JVM catalog interfaces. > > > > On Tue, Jun 7, 2022 at 4:41 AM Micah Kornfield <emkornfi...@gmail.com> > > wrote: > > > > > There’s also the question of how useful this would be in practice given > > >> the complexity of using C++ (or Rust etc) within some of the major > > >> frameworks. > > >> > > > > > > One place this would be useful is for the Arrow's DataSet API [1]. An > > > option the Arrow community might be open to is hosting parts of the > code > > > there (this is what is done for Apache Parquet C++). This helps shape > some > > > of the answers to other questions posed (ORC and Parquet are already > in the > > > Repo, it provides a Filesystem interface, etc). The project doesn't > > > currently consume Avro, and I think the preferred approach is to make a > > > clean room Avro parser. But I agree this is a non-trivial effort to > get > > > underway. > > > > > > Another area to consider is compatibility testing. I think before a > third > > > officially supported community library is introduced it would be good > to > > > have a compatibility framework in place to make sure implementations > are > > > all interpreting the specification correctly. If there isn't already > an > > > effort here, I'd like to start contributing something (probably will > have > > > bandwidth sometime place in Q3). > > > > > > Thanks, > > > -Micah > > > > > > > > > [1] https://arrow.apache.org/docs/cpp/dataset.html > > > > > > On Sun, Jun 5, 2022 at 11:07 PM Kyle Bendickson <k...@tabular.io> > wrote: > > > > > >> Hi caneGuy, > > >> > > >> I personally don’t dislike this idea. I understand the performance > > >> benefits. > > >> > > >> But this would be a huge undertaking for the community. We’d need to > > >> ensure we had sufficient developer support for reviews (likely one of > the > > >> biggest issues), as well as a number of other things. Particularly > > >> dependencies, package management, etc. We’d also need to scope > support down > > >> to specific OS / compilers etc. > > >> > > >> We’d also need to be sure we had adequate developer support from a > wide > > >> enough range of the community to support the project long term. One > issue > > >> in open source is that developers will work on something tangential to > > >> their project in another repository, but nobody is available to > maintain it. > > >> > > >> There’s also the question of how useful this would be in practice > given > > >> the complexity of using C++ (or Rust etc) within some of the major > > >> frameworks. > > >> > > >> Again, I’m not opposed to the idea but just trying to be realistic > about > > >> the realities of such an undertaking. It would need full community > support > > >> (or at least support from enough community members to be sustainable). > > >> > > >> If you wanted to make a design doc, the milestones tab in the Iceberg > > >> project has some that you might use as reference. > > >> > > >> *I highly suggest you come to the next community sync and bring this > up > > >> to the community then.* > > >> > > >> If you’re not already on the invite list for the monthly community > sync, > > >> you can get on it by joining the Google group. You’ll receive incites > when > > >> they go out: > > >> https://groups.google.com/g/iceberg-sync > > >> > > >> Looking forward to seeing you at the next community sync. > > >> > > >> A design document and/or any prior art would be very helpful as the > > >> community sync does discuss many topics (possibly there is existing > C++ > > >> support in StarRocks for Iceberg V1?). > > >> > > >> Thank you, > > >> Kyle Bendickson > > >> GitHub: kbendick > > >> > > >> On Sun, Jun 5, 2022 at 10:44 PM Sam Redai <s...@tabular.io> wrote: > > >> > > >>> Currently there is no existing effort to develop a C++ package. That > > >>> being said I think it would be awesome to have one! If anyone is > willing to > > >>> start that development effort, I can help with some of the ground > work to > > >>> kickstart it. > > >>> > > >>> I would say the first step would be for someone to prepare a > high-level > > >>> proposal. > > >>> > > >>> -Sam > > >>> > > >>> On Sun, Jun 5, 2022 at 11:02 PM 周康 <zhoukang199...@gmail.com> wrote: > > >>> > > >>>> Hi team > > >>>> I am a dev from StarRocks community, and we have supported iceberg > v1 > > >>>> format. > > >>>> We are also planning to support v2 format. If there is a C++ > package, > > >>>> it will be very convenient for our implementation. > > >>>> At the same time, other c++ computing engines support v2 format will > > >>>> also be faster. > > >>>> > > >>>> Do we have plans to support c++ version sdk? > > >>>> -- > > >>>> caneGuy > > >>>> > > >>> -- > > >>> > > >>> Sam Redai <s...@tabular.io> > > >>> > > >>> Developer Advocate | Tabular <https://tabular.io/> > > >>> > > >>> c (267) 226-8606 > > >>> > > >> > > >