> For the FileIO part, just curious—since Rust's FileIO currently also uses > OpenDAL, will there be any functional differences in terms of supported > storage services or configurations (like profile_name, signer, etc.) compared > to using opendalfs directly in Python in the future? Will Rust's FileIO > introduce any customizations/optimizations/extensions beyond what OpenDAL > supports?
Hi, Honah I believe there should be no functional differences. We can implement the exact same thing for both pyiceberg_core FileIO and opendalfs fsspec FileIO. The main difference I've noticed is in where the configuration parsing occurs. The pyiceberg_core FileIO directly exposes the FileIO class, which can inherently understand iceberg properties. We can pass these properties directly to initialize file IO without any additional effort on the pyiceberg side. However, for opendalfs fsspec FileIO, we need to parse the properties and convert them into appropriate opendalfs options for it to function properly. On Mon, Aug 5, 2024, at 15:04, Honah J. wrote: > Thanks Xuanwo for driving this and everyone for discussing, > > I like the idea of pushing down low-level logic to Iceberg-rust > (pyiceberg_core). It’s great to have another option besides PyArrow for > reading and writing data in PyIceberg. Thanks, Xuanwo, for moving this > forward with the initial PR to add pyiceberg_core. > > For the FileIO part, just curious—since Rust's FileIO currently also uses > OpenDAL, will there be any functional differences in terms of supported > storage services or configurations (like profile_name, signer, etc.) compared > to using opendalfs directly in Python in the future? Will Rust's FileIO > introduce any customizations/optimizations/extensions beyond what OpenDAL > supports? > > Best regards, > Honah > > > > On Sat, Aug 3, 2024 at 4:12 PM timog...@proton.me.INVALID > <timog...@proton.me.invalid> wrote: >> Fantastic work! I think this is a great direction, and this provides a good >> base to start iterating. >> >> It makes the most sense to me for the Python bindings (and others) to live >> in the same repo as iceberg-rust, especially at this early stage. >> >> - Tim O'Guin >> >> >> >> -------- Original Message -------- >> On 8/3/24 12:33 AM, Xuanwo __ wrote: >>> __ >>> Let's rock! Welcome to take a review: >>> https://github.com/apache/iceberg-rust/pull/518 >>> >>> On Sat, Aug 3, 2024, at 12:13, Xuanwo wrote: >>>> I also support integrating iceberg-rust with pyiceberg rather than >>>> building something new on OpenDAL. >>>> >>>> OpenDAL backed FileIO will be usable in Python once opendalfs[1], the >>>> native fsspec support for OpenDAL, is ready. Users can use opendalfs as a >>>> FileIO class directly in pure python. It's not an action item for our >>>> community to take. >>>> >>>> The consensus we've reached is that iceberg-rust will be the core of >>>> PyIceberg. The main question now is "How?" How can we implement it without >>>> disrupting our valued users? This is my top priority. >>>> >>>> *Naming is so hard! Let's refer to the new iceberg-rust based pyiceberg >>>> core as `*pyiceberg-core*` until we decide on a project name.* >>>> >>>> First, we need to establish a workflow that allows us to gradually >>>> integrate new features into pyiceberg-core. Additionally, pyiceberg should >>>> be able to import and optionally use classes from pyiceberg-core in an >>>> additive manner. While developing this workflow, our community will learn >>>> how to collaborate, manage releases, and more. >>>> >>>> We will then incorporate additional Rust-backed features into >>>> pyiceberg-core. Eventually, we may make pyiceberg-core our default >>>> implementation. >>>> >>>> My current plan is to implement this pyiceberg-core under iceberg-rust >>>> repo under `bindings/python`. >>>> >>>> - Iceberg-rust is currently under active development. I plan to release >>>> pyiceberg-core independently of iceberg-rust's release, as they feature >>>> distinct public APIs (and languages!). >>>> - Most of the work involves maintaining a few Python stubs and classes, >>>> with the majority related to Rust. >>>> - The python integration is just a start: we can expect `bindings/nodejs` >>>> to happen here too. >>>> >>>> The setup work has already been started. I will update my PR here once >>>> it's ready to review. >>>> >>>> [1]: https://github.com/fsspec/opendalfs >>>> >>>> On Sat, Aug 3, 2024, at 09:57, Renjie Liu wrote: >>>>> Hi: >>>>> >>>>> I lean towards implementing pyiceberg's FileIO backed by iceberg-rust's >>>>> FileIO, rather than directly using OpenDAL. The motivation is that we can >>>>> use this as a starting point of providing iceberg-rust backed components >>>>> for pyiceberg, and due to its simplicity, it's a good case. I believe >>>>> there will be more cases, like Sung mentioned transform in another >>>>> thread, and table scan mentioned by Fokko. >>>>> >>>>> If we want to use OpenDAL directly, we don't need iceberg-rust, since >>>>> OpenDAL already has python binding: >>>>> https://opendal.apache.org/docs/python/opendal.html >>>>> >>>>>> Do you have any experience with this? I see many projects having Rust >>>>>> and Python code in a single repository. There are some exceptions like >>>>>> Pydantic (pydantic <https://github.com/pydantic/pydantic>, pydantic-core >>>>>> <https://github.com/pydantic/pydantic-core>). >>>>> >>>>> Well, first I want to say providing a python binding for a library >>>>> written in rust is a quite common practice. Just to name a few: opendal >>>>> <https://github.com/apache/opendal>, polars >>>>> <https://github.com/pola-rs/polars>, datafusion >>>>> <https://github.com/apache/datafusion>, delta-rs >>>>> <https://github.com/delta-io/delta-rs>. As far as I know, most of them >>>>> choose to put python binding with rust in the same repo, only >>>>> datafusion-python <https://github.com/apache/datafusion-python> lives in >>>>> another, I'm not sure about the reason, maybe it's too large? >>>>> >>>>> I haven't tried to implement one before, but pyo3 >>>>> <https://github.com/PyO3> has great documentation, and there are many >>>>> existing examples in open source we can learn with. >>>>> >>>>> On Sat, Aug 3, 2024 at 2:23 AM Fokko Driesprong <fo...@apache.org> wrote: >>>>>> One more thing, >>>>>> >>>>>>> About this idea, would you have a more detailed design? For example, >>>>>>> where should the pyo3 codes live, in iceberg-rust or in pyiceberg? What >>>>>>> kind of interface should we provide to pyiceberg, FileIO or OpenDAL? >>>>>> >>>>>> Do you have any experience with this? I see many projects having Rust >>>>>> and Python code in a single repository. There are some exceptions like >>>>>> Pydantic (pydantic <https://github.com/pydantic/pydantic>, pydantic-core >>>>>> <https://github.com/pydantic/pydantic-core>). >>>>>> >>>>>> Kind regards, >>>>>> Fokko >>>>>> >>>>>> >>>>>> >>>>>> Op vr 2 aug 2024 om 20:11 schreef Fokko Driesprong <fo...@apache.org>: >>>>>>> Thanks for driving this Xuanwo, >>>>>>> >>>>>>> I already suggested this in my talk back at the Spark Summit to see if >>>>>>> we can spark some interest, and it is exciting to see this materialize. >>>>>>> >>>>>>> For the IO abstraction, I think the FileIO is the best option. We >>>>>>> already have the interface >>>>>>> <https://github.com/apache/iceberg-python/blob/6c0d307032608967ccd00cfe72d8815e6e7e01cc/pyiceberg/io/__init__.py#L239> >>>>>>> in PyIceberg, and also a PyArrowFileIO >>>>>>> <https://github.com/apache/iceberg-python/blob/6c0d307032608967ccd00cfe72d8815e6e7e01cc/pyiceberg/io/pyarrow.py#L327>. >>>>>>> I must admit that the abstraction is less clear in PyIceberg since we >>>>>>> rely so much on Arrow for reading/writing data that it is tightly >>>>>>> coupled. I would love to see if we can use OpenDAL for reading/writing >>>>>>> data, and Iceberg-rust for pushing down the low-level logic. A while >>>>>>> ago I did some profiling on the code, and one of the major issues is >>>>>>> that Arrow doesn't support proper field-ID projection. Therefore we >>>>>>> have to the Parquet file, and do the schema-evolution and type >>>>>>> promotion afterwards in Python >>>>>>> <https://github.com/apache/iceberg-python/blob/6c0d307032608967ccd00cfe72d8815e6e7e01cc/pyiceberg/io/pyarrow.py#L1444-L1458>, >>>>>>> which causes a lot of congestion on the GIL. >>>>>>> >>>>>>> Kind regards, >>>>>>> Fokko >>>>>>> >>>>>>> Op vr 2 aug 2024 om 17:46 schreef Jack Ye <yezhao...@gmail.com>: >>>>>>>> +1 for an OpenDALFileIO >>>>>>>> >>>>>>>> -Jack >>>>>>>> >>>>>>>> On Fri, Aug 2, 2024 at 8:32 AM Xuanwo <xua...@apache.org> wrote: >>>>>>>>> __ >>>>>>>>> Hi, renjie >>>>>>>>> >>>>>>>>> Thank you for your support. I'll delve into the details and first >>>>>>>>> build a PoC PR to make it clear. >>>>>>>>> >>>>>>>>> On Fri, Aug 2, 2024, at 22:51, Renjie Liu wrote: >>>>>>>>>> Hi: >>>>>>>>>> >>>>>>>>>> Thanks Xuanwo for raising this. >>>>>>>>>> >>>>>>>>>> As mentioned in another thread, I think using iceberg-rust in >>>>>>>>>> pyiceberg is a good idea. >>>>>>>>>> >>>>>>>>>> About this idea, would you have a more detailed design? For example, >>>>>>>>>> where should the pyo3 codes live, in iceberg-rust or in pyiceberg? >>>>>>>>>> What kind of interface should we provide to pyiceberg, FileIO or >>>>>>>>>> OpenDAL? >>>>>>>>>> >>>>>>>>>> I think this is a good first step moving forward to make pyiceberg >>>>>>>>>> backed iceberg-rust. In the future we can replace components >>>>>>>>>> gradually. >>>>>>>>>> >>>>>>>>>> On Fri, Aug 2, 2024 at 5:58 PM Xuanwo <xua...@apache.org> wrote: >>>>>>>>>>> __ >>>>>>>>>>> > Xuanwo, would PyIceberg and iceberg-rust share the underlying >>>>>>>>>>> > OpenDAL implementations via pyo3 / fsspec bindings >>>>>>>>>>> > <https://github.com/apache/opendal/issues/4511>? >>>>>>>>>>> >>>>>>>>>>> Hi, Raschkowski, good question! >>>>>>>>>>> >>>>>>>>>>> It's possible. There is an ongoing project developing fsspec >>>>>>>>>>> bindings for opendal at https://github.com/fsspec/opendalfs. Once >>>>>>>>>>> complete, we can directly use opendal through fsspec. >>>>>>>>>>> >>>>>>>>>>> This work is unrelated to Pyicberg or Iceberg-rust. Ideally, users >>>>>>>>>>> should be able to use opendalfs as an alternative implementation of >>>>>>>>>>> the fsspec AbstractFileSystem class. >>>>>>>>>>> >>>>>>>>>>> On Fri, Aug 2, 2024, at 17:44, Will Raschkowski wrote: >>>>>>>>>>>> Xuanwo, would PyIceberg and iceberg-rust share the underlying >>>>>>>>>>>> OpenDAL implementations via pyo3 / fsspec bindings >>>>>>>>>>>> <https://github.com/apache/opendal/issues/4511>? >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> *From:* Joe Stein <crypt...@gmail.com> >>>>>>>>>>>> *Sent:* Thursday, August 1, 2024 3:37 AM >>>>>>>>>>>> *To:* dev@iceberg.apache.org <dev@iceberg.apache.org> >>>>>>>>>>>> *Subject:* Re: [DISCUSS] Use iceberg-rust as pyiceberg file io >>>>>>>>>>>> >>>>>>>>>>>> *CAUTION:* This email originates from an external party (outside >>>>>>>>>>>> of Palantir). If you believe this message is suspicious in nature, >>>>>>>>>>>> please use the "Report Message" button built into Outlook. >>>>>>>>>>>> >>>>>>>>>>>> Kafka did this with librdkafka and was wildly successful. The >>>>>>>>>>>> underlying bindings being in rust are great with a layer for >>>>>>>>>>>> access in Python +1 >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> ~ Joe Stein >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Wed, Jul 31, 2024 at 10:29 PM Xuanwo <xua...@apache.org> wrote: >>>>>>>>>>>>> Hello everyone >>>>>>>>>>>>> >>>>>>>>>>>>> I start this thread to discuss the idea about using iceberg-rust >>>>>>>>>>>>> as pyiceberg file io. >>>>>>>>>>>>> >>>>>>>>>>>>> The idea is living at >>>>>>>>>>>>> https://hackmd.io/@xuanwo/iceberg_rust_as_file_io [hackmd.io] >>>>>>>>>>>>> <https://urldefense.com/v3/__https://hackmd.io/@xuanwo/iceberg_rust_as_file_io__;!!NkS9JGVQ2sDq!7Js41FIzh2smsAOySXrKd527DXCmXdrwV8Uov8TIdQqLRcsCkfPnHzfbxbX_xctpoNpYw2XGfrduTPd6ppTI$> >>>>>>>>>>>>> >>>>>>>>>>>>> In summary, we can leverage the work from iceberg-rust to help >>>>>>>>>>>>> pyiceberg in developing a fast and compact file IO system that >>>>>>>>>>>>> benefits users with specific constraints. >>>>>>>>>>>>> >>>>>>>>>>>>> Welcome to join in the discussion. >>>>>>>>>>>>> >>>>>>>>>>>>> Xuanwo >>>>>>>>>>>>> >>>>>>>>>>>>> https://xuanwo.io/ [xuanwo.io] >>>>>>>>>>>>> <https://urldefense.com/v3/__https://xuanwo.io/__;!!NkS9JGVQ2sDq!7Js41FIzh2smsAOySXrKd527DXCmXdrwV8Uov8TIdQqLRcsCkfPnHzfbxbX_xctpoNpYw2XGfrduTNspr1jI$> >>>>>>>>>>> Xuanwo >>>>>>>>>>> >>>>>>>>>>> https://xuanwo.io/ >>>>>>>>>>> >>>>>>>>> Xuanwo >>>>>>>>> >>>>>>>>> https://xuanwo.io/ >>>>>>>>> >>>> Xuanwo >>>> >>>> https://xuanwo.io/ >>>> >>> Xuanwo >>> >>> https://xuanwo.io/ >>> Xuanwo https://xuanwo.io/