As Micah said, this would be pretty cool to use in Arrow datasets. I can't make any promises about helping develop it but if it were developed I could help integrate into Arrow datasets / Acero and provide some proof of concept.
On Wed, Jun 8, 2022, 6:35 AM Ryan Blue <b...@tabular.io> wrote: > While I understand Kyle's concerns, I'm all for a C++ or Rust > implementation. > > We know that this is going to help a lot of people that want to integrate > Iceberg in engines that are outside the JVM ecosystem. I think it would be > great to work with anyone that is interested and build up the community in > this area! > > Ryan > > On Wed, Jun 8, 2022 at 3:16 AM OpenInx <open...@gmail.com> wrote: > >> As a cloud-native table format standard for the big-data ecosystem, I >> believe supporting multiple languages is the correct direction so that >> different languages can connect to the apache iceberg table format. >> >> But I can also get Kyle's point about lacking enough resources(developers >> and reviewers ) to accomplish this goal. In my mind, Python, Golang, C++, >> Rust , all of them can be regarded as the native language support. we may >> just need to support the Rust SDK and then all of the other languages can >> just wrap the Rust SDK to access the table format. >> >> Anyway, we will need to wait for the REST catalog finished before we >> introduce another languages support , because we can not access the iceberg >> table by invoking the JVM catalog interfaces. >> >> On Tue, Jun 7, 2022 at 4:41 AM Micah Kornfield <emkornfi...@gmail.com> >> wrote: >> >>> There’s also the question of how useful this would be in practice given >>>> the complexity of using C++ (or Rust etc) within some of the major >>>> frameworks. >>>> >>> >>> One place this would be useful is for the Arrow's DataSet API [1]. An >>> option the Arrow community might be open to is hosting parts of the code >>> there (this is what is done for Apache Parquet C++). This helps shape some >>> of the answers to other questions posed (ORC and Parquet are already in the >>> Repo, it provides a Filesystem interface, etc). The project doesn't >>> currently consume Avro, and I think the preferred approach is to make a >>> clean room Avro parser. But I agree this is a non-trivial effort to get >>> underway. >>> >>> Another area to consider is compatibility testing. I think before a >>> third officially supported community library is introduced it would be good >>> to have a compatibility framework in place to make sure implementations are >>> all interpreting the specification correctly. If there isn't already an >>> effort here, I'd like to start contributing something (probably will have >>> bandwidth sometime place in Q3). >>> >>> Thanks, >>> -Micah >>> >>> >>> [1] https://arrow.apache.org/docs/cpp/dataset.html >>> >>> On Sun, Jun 5, 2022 at 11:07 PM Kyle Bendickson <k...@tabular.io> wrote: >>> >>>> Hi caneGuy, >>>> >>>> I personally don’t dislike this idea. I understand the performance >>>> benefits. >>>> >>>> But this would be a huge undertaking for the community. We’d need to >>>> ensure we had sufficient developer support for reviews (likely one of the >>>> biggest issues), as well as a number of other things. Particularly >>>> dependencies, package management, etc. We’d also need to scope support down >>>> to specific OS / compilers etc. >>>> >>>> We’d also need to be sure we had adequate developer support from a wide >>>> enough range of the community to support the project long term. One issue >>>> in open source is that developers will work on something tangential to >>>> their project in another repository, but nobody is available to maintain >>>> it. >>>> >>>> There’s also the question of how useful this would be in practice given >>>> the complexity of using C++ (or Rust etc) within some of the major >>>> frameworks. >>>> >>>> Again, I’m not opposed to the idea but just trying to be realistic >>>> about the realities of such an undertaking. It would need full community >>>> support (or at least support from enough community members to be >>>> sustainable). >>>> >>>> If you wanted to make a design doc, the milestones tab in the Iceberg >>>> project has some that you might use as reference. >>>> >>>> *I highly suggest you come to the next community sync and bring this up >>>> to the community then.* >>>> >>>> If you’re not already on the invite list for the monthly community >>>> sync, you can get on it by joining the Google group. You’ll receive incites >>>> when they go out: >>>> https://groups.google.com/g/iceberg-sync >>>> >>>> Looking forward to seeing you at the next community sync. >>>> >>>> A design document and/or any prior art would be very helpful as the >>>> community sync does discuss many topics (possibly there is existing C++ >>>> support in StarRocks for Iceberg V1?). >>>> >>>> Thank you, >>>> Kyle Bendickson >>>> GitHub: kbendick >>>> >>>> On Sun, Jun 5, 2022 at 10:44 PM Sam Redai <s...@tabular.io> wrote: >>>> >>>>> Currently there is no existing effort to develop a C++ package. That >>>>> being said I think it would be awesome to have one! If anyone is willing >>>>> to >>>>> start that development effort, I can help with some of the ground work to >>>>> kickstart it. >>>>> >>>>> I would say the first step would be for someone to prepare a >>>>> high-level proposal. >>>>> >>>>> -Sam >>>>> >>>>> On Sun, Jun 5, 2022 at 11:02 PM 周康 <zhoukang199...@gmail.com> wrote: >>>>> >>>>>> Hi team >>>>>> I am a dev from StarRocks community, and we have supported iceberg v1 >>>>>> format. >>>>>> We are also planning to support v2 format. If there is a C++ package, >>>>>> it will be very convenient for our implementation. >>>>>> At the same time, other c++ computing engines support v2 format will >>>>>> also be faster. >>>>>> >>>>>> Do we have plans to support c++ version sdk? >>>>>> -- >>>>>> caneGuy >>>>>> >>>>> -- >>>>> >>>>> Sam Redai <s...@tabular.io> >>>>> >>>>> Developer Advocate | Tabular <https://tabular.io/> >>>>> >>>>> c (267) 226-8606 >>>>> >>>> > > -- > Ryan Blue > Tabular >