I like the vision of building multi-langage sdk on top rust kernel. Apache OpenDAL <https://opendal.apache.org/> is also another successful story of this approach.
On Fri, Aug 16, 2024 at 11:35 AM Zheng Hu <open...@gmail.com> wrote: > From my understanding, the most abstracted approach for implementing a > multi-language SDK is: building another language SDK (Python, Ruby, Go, > etc) on top of the Iceberg-Rust SDK. In this case, we can make our > community resources focus on the rust native kernel, and all of the other > language bindings will have the same progress and support. > > This may put higher requirements on our iceberg-rust: > 1. The community needs more people who understand both iceberg and rust; > 2. The API of iceberg-rust needs to be more abstract and universal, so > that we can more easily export it into native API and use it in other > language bindings. > > Anyway, I personally think that abstracting the problem of multi-language > and focusing on iceberg-rust will be the right direction. Just like what we > see lancedb[1] doing, its kernel is very focused on the rust kernel, based > on which it has successfully built a multi-language ecosystem of python and > nodejs. In the future, I think it is also very reasonable and easy to > expand to a big data ecosystem like Java. > > 1. https://github.com/lancedb/lancedb > > On Thu, Aug 8, 2024 at 7:09 AM Chris Atkins <chri...@buildkite.com.invalid> > wrote: > >> > Do you know how big the Ruby data community is? I think the most >> important part is that it gets some traction and will continue to be >> maintained. >> >> Its a great question Fokko! I'd say that the data community in Ruby is >> nascent, but definitely exists. There are some prolific folks like >> Andrew Kane (the fellow who created pgvector) >> https://ankane.org/opensource who have released a lot of data related >> gems, and there are a healthy set of bindings for Apache Arrow, Avro and >> friends. >> >> In my experience, data technology in Ruby often shows up for very >> particular use-cases within a larger Ruby on Rails application. An >> example would be using the DuckDB, Arrow or Polars binding gems to do data >> export or reverse-ETL with Parquet files in object storage; where parts of >> the process work with the core domain objects. Another use-case is for >> user or administrator-facing reporting features. My own use-case is wanting >> to (eventually) perform some reads/write from some iceberg tables directly >> from our Rails monolith without needing to call out to Trino. >> >> Thanks, >> >> Chris Atkins >> Principal Engineer >> Buildkite >> >> On Tue, 6 Aug 2024 at 17:14, Fokko Driesprong <fo...@apache.org> wrote: >> >>> Hi Chris, >>> >>> Thanks for raising this. Do you know how big the Ruby data community is? >>> I think the most important part is that it gets some traction and will >>> continue to be maintained. >>> >>> I fully agree that building on top of iceberg-rust makes a lot of sense, >>> since also with PyIceberg we're running into limitations when it comes to >>> performance and limited parallelism. >>> >>> Kind regards, >>> Fokko >>> >>> Op ma 5 aug 2024 om 14:14 schreef Xuanwo <xua...@apache.org>: >>> >>>> Hi, Chris >>>> >>>> I love this idea. One of the main reasons I started working on >>>> iceberg-rust is due to the potential that a rust-powered iceberg core can >>>> offer. >>>> >>>> I'm not an experienced ruby developer, but I'm willing to help with >>>> some CI setup or docs since I have some experience in the opendal community >>>> with ruby bindings. >>>> >>>> On Mon, Aug 5, 2024, at 20:03, Renjie Liu wrote: >>>> >>>> Hi, Chris: >>>> >>>> Thanks for raising this. Generally I'm +1 with building ruby bindings >>>> on top of rust implementation, who would help introduce iceberg into the >>>> ruby ecosystem. >>>> >>>> On Mon, Aug 5, 2024 at 7:30 PM Chris Atkins >>>> <chri...@buildkite.com.invalid> wrote: >>>> >>>> Hi there, >>>> >>>> I'm following up on a discussion >>>> <https://apache-iceberg.slack.com/archives/C05HTENMJG4/p1722750831522969> >>>> from >>>> the #rust channel on the Iceberg community slack, so starting a thread here >>>> too. >>>> >>>> After seeing Xuanwo's and Song's recent proposals around leveraging >>>> iceberg-rust to power parts of PyIceberg, I was thinking it could be >>>> valuable to follow a similar pattern to build out Ruby bindings for >>>> Iceberg. Being able to stand on the shoulders of iceberg-rust could really >>>> help build out a robust Ruby interface, and also offer some opportunities >>>> for interop with things like datafusion and opendal. >>>> >>>> Recently in the Ruby ecosystem, writing native extensions in Rust has >>>> become more popular, and tools like rb-sys and magnus provide a lot of the >>>> required infrastructure. A good example is ruby-polars, which provides an >>>> interface that is idiomatic Ruby but retains good symmetry with the APIs >>>> exposed by py-polars. I wonder if we could eventually aim for a similar >>>> type of symmetry between PyIceberg and a Ruby gem? >>>> >>>> Is there much interest in this? I've started playing around with some >>>> of the basics, and started out with a plain native Ruby implementation of >>>> some of the basic metadata APIs, but quickly realised that building on >>>> iceberg-rust could be more productive than writing it all from scratch. >>>> >>>> *References* >>>> >>>> https://lists.apache.org/thread/5570vbdkrk7mdswt4jqy45lv7y58pz4b >>>> https://lists.apache.org/thread/33c0nkc3k6646lvro1lv22pvhwlp50ss >>>> https://github.com/apache/iceberg-rust/pull/518 >>>> >>>> *Prior Art in Ruby* >>>> >>>> https://github.com/matsadler/magnus >>>> https://github.com/oxidize-rb/rb-sys >>>> https://github.com/ankane/ruby-polars >>>> https://github.com/apache/opendal/tree/main/bindings/ruby >>>> >>>> Thanks, >>>> Chris Atkins >>>> >>>> Xuanwo >>>> >>>> https://xuanwo.io/ >>>> >>>>