One more thing, About this idea, would you have a more detailed design? For example, where > should the pyo3 codes live, in iceberg-rust or in pyiceberg? What kind of > interface should we provide to pyiceberg, FileIO or OpenDAL?
Do you have any experience with this? I see many projects having Rust and Python code in a single repository. There are some exceptions like Pydantic (pydantic <https://github.com/pydantic/pydantic>, pydantic-core <https://github.com/pydantic/pydantic-core>). Kind regards, Fokko Op vr 2 aug 2024 om 20:11 schreef Fokko Driesprong <fo...@apache.org>: > Thanks for driving this Xuanwo, > > I already suggested this in my talk back at the Spark Summit to see if we > can spark some interest, and it is exciting to see this materialize. > > For the IO abstraction, I think the FileIO is the best option. We already > have the interface > <https://github.com/apache/iceberg-python/blob/6c0d307032608967ccd00cfe72d8815e6e7e01cc/pyiceberg/io/__init__.py#L239> > in PyIceberg, and also a PyArrowFileIO > <https://github.com/apache/iceberg-python/blob/6c0d307032608967ccd00cfe72d8815e6e7e01cc/pyiceberg/io/pyarrow.py#L327>. > I must admit that the abstraction is less clear in PyIceberg since we rely > so much on Arrow for reading/writing data that it is tightly coupled. I > would love to see if we can use OpenDAL for reading/writing data, and > Iceberg-rust for pushing down the low-level logic. A while ago I did some > profiling on the code, and one of the major issues is that Arrow doesn't > support proper field-ID projection. Therefore we have to the Parquet file, > and do the schema-evolution and type promotion afterwards in Python > <https://github.com/apache/iceberg-python/blob/6c0d307032608967ccd00cfe72d8815e6e7e01cc/pyiceberg/io/pyarrow.py#L1444-L1458>, > which causes a lot of congestion on the GIL. > > Kind regards, > Fokko > > Op vr 2 aug 2024 om 17:46 schreef Jack Ye <yezhao...@gmail.com>: > >> +1 for an OpenDALFileIO >> >> -Jack >> >> On Fri, Aug 2, 2024 at 8:32 AM Xuanwo <xua...@apache.org> wrote: >> >>> Hi, renjie >>> >>> Thank you for your support. I'll delve into the details and first build >>> a PoC PR to make it clear. >>> >>> On Fri, Aug 2, 2024, at 22:51, Renjie Liu wrote: >>> >>> Hi: >>> >>> Thanks Xuanwo for raising this. >>> >>> As mentioned in another thread, I think using iceberg-rust in pyiceberg >>> is a good idea. >>> >>> About this idea, would you have a more detailed design? For example, >>> where should the pyo3 codes live, in iceberg-rust or in pyiceberg? What >>> kind of interface should we provide to pyiceberg, FileIO or OpenDAL? >>> >>> I think this is a good first step moving forward to make pyiceberg >>> backed iceberg-rust. In the future we can replace components gradually. >>> >>> On Fri, Aug 2, 2024 at 5:58 PM Xuanwo <xua...@apache.org> wrote: >>> >>> >>> > Xuanwo, would PyIceberg and iceberg-rust share the underlying OpenDAL >>> implementations via pyo3 / fsspec bindings >>> <https://github.com/apache/opendal/issues/4511>? >>> >>> Hi, Raschkowski, good question! >>> >>> It's possible. There is an ongoing project developing fsspec bindings >>> for opendal at https://github.com/fsspec/opendalfs. Once complete, we >>> can directly use opendal through fsspec. >>> >>> This work is unrelated to Pyicberg or Iceberg-rust. Ideally, users >>> should be able to use opendalfs as an alternative implementation of the >>> fsspec AbstractFileSystem class. >>> >>> On Fri, Aug 2, 2024, at 17:44, Will Raschkowski wrote: >>> >>> Xuanwo, would PyIceberg and iceberg-rust share the underlying OpenDAL >>> implementations via pyo3 / fsspec bindings >>> <https://github.com/apache/opendal/issues/4511>? >>> >>> >>> ------------------------------ >>> >>> *From:* Joe Stein <crypt...@gmail.com> >>> *Sent:* Thursday, August 1, 2024 3:37 AM >>> *To:* dev@iceberg.apache.org <dev@iceberg.apache.org> >>> *Subject:* Re: [DISCUSS] Use iceberg-rust as pyiceberg file io >>> >>> *CAUTION:* This email originates from an external party (outside of >>> Palantir). If you believe this message is suspicious in nature, please use >>> the "Report Message" button built into Outlook. >>> >>> Kafka did this with librdkafka and was wildly successful. The underlying >>> bindings being in rust are great with a layer for access in Python +1 >>> >>> >>> ~ Joe Stein >>> >>> >>> On Wed, Jul 31, 2024 at 10:29 PM Xuanwo <xua...@apache.org> wrote: >>> >>> Hello everyone >>> >>> I start this thread to discuss the idea about using iceberg-rust as >>> pyiceberg file io. >>> >>> The idea is living at https://hackmd.io/@xuanwo/iceberg_rust_as_file_io >>> [hackmd.io] >>> <https://urldefense.com/v3/__https://hackmd.io/@xuanwo/iceberg_rust_as_file_io__;!!NkS9JGVQ2sDq!7Js41FIzh2smsAOySXrKd527DXCmXdrwV8Uov8TIdQqLRcsCkfPnHzfbxbX_xctpoNpYw2XGfrduTPd6ppTI$> >>> >>> In summary, we can leverage the work from iceberg-rust to help pyiceberg >>> in developing a fast and compact file IO system that benefits users with >>> specific constraints. >>> >>> Welcome to join in the discussion. >>> >>> Xuanwo >>> >>> https://xuanwo.io/ [xuanwo.io] >>> <https://urldefense.com/v3/__https://xuanwo.io/__;!!NkS9JGVQ2sDq!7Js41FIzh2smsAOySXrKd527DXCmXdrwV8Uov8TIdQqLRcsCkfPnHzfbxbX_xctpoNpYw2XGfrduTNspr1jI$> >>> >>> Xuanwo >>> >>> https://xuanwo.io/ >>> >>> Xuanwo >>> >>> https://xuanwo.io/ >>> >>>