+1 for Rust. On Wed, Jun 22, 2022 at 4:21 AM LuNing Wang <wang4lun...@gmail.com> wrote:
> +1 for Rust > > Best Regards, > LuNing Wang > > Nan Zhu <zhunanmcg...@gmail.com> 于2022年6月22日周三 14:15写道: > >> +1 for using rust as the backbone for new language bindings >> >> On Sun, Jun 12, 2022 at 23:52 OpenInx <open...@gmail.com> wrote: >> >>> Thanks Kyle for sharing your context. >>> >>> Recently, I also spent some time practicing my Rust skills. Generally, >>> I'm +1 for adding Rust SDK support for native language. >>> >>> >>> On Mon, Jun 13, 2022 at 12:51 PM Kyle Bendickson <k...@tabular.io> >>> wrote: >>> >>>> Thanks for starting this discussion. >>>> >>>> I know I was the first to mention some of my concerns (which I still >>>> have and would apply to any new major change), but I also think that this >>>> is an avenue that should be explored. >>>> >>>> Specifically a native integration would have many benefits for >>>> read paths (in addition to others). I know that the Rust avro reader is >>>> significantly faster, as well as native columnar formats. >>>> >>>> So while I do have some concerns about making sure we have enough >>>> people to support this endeavor, I do want to say I think it's a really >>>> good idea. My apologies if I gave the impression otherwise. >>>> >>>> I would personally be interested in contributing to and reviewing for a >>>> native Rust library (or CPP, but I think Rust is a much more elegant >>>> language and I'd personally prefer to work in that as it's easier to work >>>> with across systems than C++ imo though I would defer to others on that). >>>> >>>> I would also be happy to offer my help and perspective in moving this >>>> forward if need be. But I did want to express my practical concerns so that >>>> we don't have an area of the codebase where there aren't enough people to >>>> help maintain it etc. >>>> >>>> But in general I think this is an exciting opportunity, and results >>>> have shown time and time again that native readers / writers are much more >>>> performant. >>>> >>>> +1 to using Rust as well (which is a language I know more of than C++ >>>> these days - though both I'd have to brush off my skillset). >>>> >>>> Best, Kyle >>>> >>>> On Sun, Jun 12, 2022 at 8:20 PM OpenInx <open...@gmail.com> wrote: >>>> >>>>> Hi Tao Wu. >>>>> >>>>> I think the apache iceberg community is very consistent in providing >>>>> the Iceberg SDK for native languages. I am very happy to offer my >>>>> perspective and help if needed when you try to move this thing forward. >>>>> >>>>> On Mon, Jun 13, 2022 at 11:04 AM Wu Tao <wu...@apache.org> wrote: >>>>> >>>>>> Hi, everyone, I'm Tao. I'm currently working on a commercial >>>>>> streaming system that is written in Rust. >>>>>> >>>>>> Actually, I'm planning to implement an Iceberg Rust SDK so that we >>>>>> can have better integration with the existing Iceberg ecosystem. >>>>>> Initially >>>>>> I found https://github.com/oliverdaff/iceberg-rs, but it appears the >>>>>> author hasn't been active lately. So I'm looking to see if the Iceberg >>>>>> community has any consensus on a Rust/C++ SDK (Rust is preferable), and >>>>>> if >>>>>> there is, we'd love to contribute. I believe as Iceberg increases its >>>>>> popularity, there will eventually be more systems that want such >>>>>> libraries. >>>>>> There could have even been some ongoing works without consulting with the >>>>>> community. >>>>>> >>>>>> Additionally, I think the initial Rust/C++ SDK can only support the >>>>>> reader&writer sides of Iceberg. Because there have been plenty of >>>>>> JVM-based >>>>>> query engines out there taking charge of data maintenance. We don't have >>>>>> to >>>>>> rewrite every corner of Iceberg in Rust. That means less engineering >>>>>> work. >>>>>> >>>>>> On 2022/06/08 10:16:05 OpenInx wrote: >>>>>> > As a cloud-native table format standard for the big-data >>>>>> ecosystem, I >>>>>> > believe supporting multiple languages is the correct direction so >>>>>> that >>>>>> > different languages can connect to the apache iceberg table format. >>>>>> > >>>>>> > But I can also get Kyle's point about lacking enough >>>>>> resources(developers >>>>>> > and reviewers ) to accomplish this goal. In my mind, Python, >>>>>> Golang, C++, >>>>>> > Rust , all of them can be regarded as the native language support. >>>>>> we may >>>>>> > just need to support the Rust SDK and then all of the other >>>>>> languages can >>>>>> > just wrap the Rust SDK to access the table format. >>>>>> > >>>>>> > Anyway, we will need to wait for the REST catalog finished before >>>>>> we >>>>>> > introduce another languages support , because we can not access the >>>>>> iceberg >>>>>> > table by invoking the JVM catalog interfaces. >>>>>> > >>>>>> > On Tue, Jun 7, 2022 at 4:41 AM Micah Kornfield < >>>>>> emkornfi...@gmail.com> >>>>>> > wrote: >>>>>> > >>>>>> > > There’s also the question of how useful this would be in practice >>>>>> given >>>>>> > >> the complexity of using C++ (or Rust etc) within some of the >>>>>> major >>>>>> > >> frameworks. >>>>>> > >> >>>>>> > > >>>>>> > > One place this would be useful is for the Arrow's DataSet API >>>>>> [1]. An >>>>>> > > option the Arrow community might be open to is hosting parts of >>>>>> the code >>>>>> > > there (this is what is done for Apache Parquet C++). This helps >>>>>> shape some >>>>>> > > of the answers to other questions posed (ORC and Parquet are >>>>>> already in the >>>>>> > > Repo, it provides a Filesystem interface, etc). The project >>>>>> doesn't >>>>>> > > currently consume Avro, and I think the preferred approach is to >>>>>> make a >>>>>> > > clean room Avro parser. But I agree this is a non-trivial effort >>>>>> to get >>>>>> > > underway. >>>>>> > > >>>>>> > > Another area to consider is compatibility testing. I think >>>>>> before a third >>>>>> > > officially supported community library is introduced it would be >>>>>> good to >>>>>> > > have a compatibility framework in place to make sure >>>>>> implementations are >>>>>> > > all interpreting the specification correctly. If there isn't >>>>>> already an >>>>>> > > effort here, I'd like to start contributing something (probably >>>>>> will have >>>>>> > > bandwidth sometime place in Q3). >>>>>> > > >>>>>> > > Thanks, >>>>>> > > -Micah >>>>>> > > >>>>>> > > >>>>>> > > [1] https://arrow.apache.org/docs/cpp/dataset.html >>>>>> > > >>>>>> > > On Sun, Jun 5, 2022 at 11:07 PM Kyle Bendickson <k...@tabular.io> >>>>>> wrote: >>>>>> > > >>>>>> > >> Hi caneGuy, >>>>>> > >> >>>>>> > >> I personally don’t dislike this idea. I understand the >>>>>> performance >>>>>> > >> benefits. >>>>>> > >> >>>>>> > >> But this would be a huge undertaking for the community. We’d >>>>>> need to >>>>>> > >> ensure we had sufficient developer support for reviews (likely >>>>>> one of the >>>>>> > >> biggest issues), as well as a number of other things. >>>>>> Particularly >>>>>> > >> dependencies, package management, etc. We’d also need to scope >>>>>> support down >>>>>> > >> to specific OS / compilers etc. >>>>>> > >> >>>>>> > >> We’d also need to be sure we had adequate developer support from >>>>>> a wide >>>>>> > >> enough range of the community to support the project long term. >>>>>> One issue >>>>>> > >> in open source is that developers will work on something >>>>>> tangential to >>>>>> > >> their project in another repository, but nobody is available to >>>>>> maintain it. >>>>>> > >> >>>>>> > >> There’s also the question of how useful this would be in >>>>>> practice given >>>>>> > >> the complexity of using C++ (or Rust etc) within some of the >>>>>> major >>>>>> > >> frameworks. >>>>>> > >> >>>>>> > >> Again, I’m not opposed to the idea but just trying to be >>>>>> realistic about >>>>>> > >> the realities of such an undertaking. It would need full >>>>>> community support >>>>>> > >> (or at least support from enough community members to be >>>>>> sustainable). >>>>>> > >> >>>>>> > >> If you wanted to make a design doc, the milestones tab in the >>>>>> Iceberg >>>>>> > >> project has some that you might use as reference. >>>>>> > >> >>>>>> > >> *I highly suggest you come to the next community sync and bring >>>>>> this up >>>>>> > >> to the community then.* >>>>>> > >> >>>>>> > >> If you’re not already on the invite list for the monthly >>>>>> community sync, >>>>>> > >> you can get on it by joining the Google group. You’ll receive >>>>>> incites when >>>>>> > >> they go out: >>>>>> > >> https://groups.google.com/g/iceberg-sync >>>>>> > >> >>>>>> > >> Looking forward to seeing you at the next community sync. >>>>>> > >> >>>>>> > >> A design document and/or any prior art would be very helpful as >>>>>> the >>>>>> > >> community sync does discuss many topics (possibly there is >>>>>> existing C++ >>>>>> > >> support in StarRocks for Iceberg V1?). >>>>>> > >> >>>>>> > >> Thank you, >>>>>> > >> Kyle Bendickson >>>>>> > >> GitHub: kbendick >>>>>> > >> >>>>>> > >> On Sun, Jun 5, 2022 at 10:44 PM Sam Redai <s...@tabular.io> >>>>>> wrote: >>>>>> > >> >>>>>> > >>> Currently there is no existing effort to develop a C++ package. >>>>>> That >>>>>> > >>> being said I think it would be awesome to have one! If anyone >>>>>> is willing to >>>>>> > >>> start that development effort, I can help with some of the >>>>>> ground work to >>>>>> > >>> kickstart it. >>>>>> > >>> >>>>>> > >>> I would say the first step would be for someone to prepare a >>>>>> high-level >>>>>> > >>> proposal. >>>>>> > >>> >>>>>> > >>> -Sam >>>>>> > >>> >>>>>> > >>> On Sun, Jun 5, 2022 at 11:02 PM 周康 <zhoukang199...@gmail.com> >>>>>> wrote: >>>>>> > >>> >>>>>> > >>>> Hi team >>>>>> > >>>> I am a dev from StarRocks community, and we have supported >>>>>> iceberg v1 >>>>>> > >>>> format. >>>>>> > >>>> We are also planning to support v2 format. If there is a C++ >>>>>> package, >>>>>> > >>>> it will be very convenient for our implementation. >>>>>> > >>>> At the same time, other c++ computing engines support v2 >>>>>> format will >>>>>> > >>>> also be faster. >>>>>> > >>>> >>>>>> > >>>> Do we have plans to support c++ version sdk? >>>>>> > >>>> -- >>>>>> > >>>> caneGuy >>>>>> > >>>> >>>>>> > >>> -- >>>>>> > >>> >>>>>> > >>> Sam Redai <s...@tabular.io> >>>>>> > >>> >>>>>> > >>> Developer Advocate | Tabular <https://tabular.io/> >>>>>> > >>> >>>>>> > >>> c (267) 226-8606 >>>>>> > >>> >>>>>> > >> >>>>>> > >>>>>> >>>>> >>>> >>>> -- >>>> >>>> Kyle Bendickson >>>> >>>> OSS Developer | Tabular <https://tabular.io/> >>>> >>>> k...@tabular.io >>>> >>> -- Josh Howard