> > There’s also the question of how useful this would be in practice given > the complexity of using C++ (or Rust etc) within some of the major > frameworks. >
One place this would be useful is for the Arrow's DataSet API [1]. An option the Arrow community might be open to is hosting parts of the code there (this is what is done for Apache Parquet C++). This helps shape some of the answers to other questions posed (ORC and Parquet are already in the Repo, it provides a Filesystem interface, etc). The project doesn't currently consume Avro, and I think the preferred approach is to make a clean room Avro parser. But I agree this is a non-trivial effort to get underway. Another area to consider is compatibility testing. I think before a third officially supported community library is introduced it would be good to have a compatibility framework in place to make sure implementations are all interpreting the specification correctly. If there isn't already an effort here, I'd like to start contributing something (probably will have bandwidth sometime place in Q3). Thanks, -Micah [1] https://arrow.apache.org/docs/cpp/dataset.html On Sun, Jun 5, 2022 at 11:07 PM Kyle Bendickson <k...@tabular.io> wrote: > Hi caneGuy, > > I personally don’t dislike this idea. I understand the performance > benefits. > > But this would be a huge undertaking for the community. We’d need to > ensure we had sufficient developer support for reviews (likely one of the > biggest issues), as well as a number of other things. Particularly > dependencies, package management, etc. We’d also need to scope support down > to specific OS / compilers etc. > > We’d also need to be sure we had adequate developer support from a wide > enough range of the community to support the project long term. One issue > in open source is that developers will work on something tangential to > their project in another repository, but nobody is available to maintain it. > > There’s also the question of how useful this would be in practice given > the complexity of using C++ (or Rust etc) within some of the major > frameworks. > > Again, I’m not opposed to the idea but just trying to be realistic about > the realities of such an undertaking. It would need full community support > (or at least support from enough community members to be sustainable). > > If you wanted to make a design doc, the milestones tab in the Iceberg > project has some that you might use as reference. > > *I highly suggest you come to the next community sync and bring this up to > the community then.* > > If you’re not already on the invite list for the monthly community sync, > you can get on it by joining the Google group. You’ll receive incites when > they go out: > https://groups.google.com/g/iceberg-sync > > Looking forward to seeing you at the next community sync. > > A design document and/or any prior art would be very helpful as the > community sync does discuss many topics (possibly there is existing C++ > support in StarRocks for Iceberg V1?). > > Thank you, > Kyle Bendickson > GitHub: kbendick > > On Sun, Jun 5, 2022 at 10:44 PM Sam Redai <s...@tabular.io> wrote: > >> Currently there is no existing effort to develop a C++ package. That >> being said I think it would be awesome to have one! If anyone is willing to >> start that development effort, I can help with some of the ground work to >> kickstart it. >> >> I would say the first step would be for someone to prepare a high-level >> proposal. >> >> -Sam >> >> On Sun, Jun 5, 2022 at 11:02 PM 周康 <zhoukang199...@gmail.com> wrote: >> >>> Hi team >>> I am a dev from StarRocks community, and we have supported iceberg v1 >>> format. >>> We are also planning to support v2 format. If there is a C++ package, it >>> will be very convenient for our implementation. >>> At the same time, other c++ computing engines support v2 format will >>> also be faster. >>> >>> Do we have plans to support c++ version sdk? >>> -- >>> caneGuy >>> >> -- >> >> Sam Redai <s...@tabular.io> >> >> Developer Advocate | Tabular <https://tabular.io/> >> >> c (267) 226-8606 >> >