Fantastic work! I think this is a great direction, and this provides a good 
base to start iterating.

It makes the most sense to me for the Python bindings (and others) to live in 
the same repo as iceberg-rust, especially at this early stage.

- Tim O'Guin

-------- Original Message --------
On 8/3/24 12:33 AM, Xuanwo  wrote:

> Let's rock! Welcome to take a review: 
> https://github.com/apache/iceberg-rust/pull/518
>
> On Sat, Aug 3, 2024, at 12:13, Xuanwo wrote:
>
>> I also support integrating iceberg-rust with pyiceberg rather than building 
>> something new on OpenDAL.
>>
>> OpenDAL backed FileIO will be usable in Python once opendalfs[1], the native 
>> fsspec support for OpenDAL, is ready. Users can use opendalfs as a FileIO 
>> class directly in pure python. It's not an action item for our community to 
>> take.
>>
>> The consensus we've reached is that iceberg-rust will be the core of 
>> PyIceberg. The main question now is "How?" How can we implement it without 
>> disrupting our valued users? This is my top priority.
>>
>> Naming is so hard! Let's refer to the new iceberg-rust based pyiceberg core 
>> as `pyiceberg-core` until we decide on a project name.
>>
>> First, we need to establish a workflow that allows us to gradually integrate 
>> new features into pyiceberg-core. Additionally, pyiceberg should be able to 
>> import and optionally use classes from pyiceberg-core in an additive manner. 
>> While developing this workflow, our community will learn how to collaborate, 
>> manage releases, and more.
>>
>> We will then incorporate additional Rust-backed features into 
>> pyiceberg-core. Eventually, we may make pyiceberg-core our default 
>> implementation.
>>
>> My current plan is to implement this pyiceberg-core under iceberg-rust repo 
>> under `bindings/python`.
>>
>> - Iceberg-rust is currently under active development. I plan to release 
>> pyiceberg-core independently of iceberg-rust's release, as they feature 
>> distinct public APIs (and languages!).
>> - Most of the work involves maintaining a few Python stubs and classes, with 
>> the majority related to Rust.
>> - The python integration is just a start: we can expect `bindings/nodejs` to 
>> happen here too.
>>
>> The setup work has already been started. I will update my PR here once it's 
>> ready to review.
>>
>> [1]: https://github.com/fsspec/opendalfs
>>
>> On Sat, Aug 3, 2024, at 09:57, Renjie Liu wrote:
>>
>>> Hi:
>>>
>>> I lean towards implementing pyiceberg's FileIO backed by iceberg-rust's 
>>> FileIO, rather than directly using OpenDAL. The motivation is that we can 
>>> use this as a starting point of providing iceberg-rust backed components 
>>> for pyiceberg, and due to its simplicity, it's a good case. I believe there 
>>> will be more cases, like Sung mentioned transform in another thread, and 
>>> table scan mentioned by Fokko.
>>>
>>> If we want to use OpenDAL directly, we don't need iceberg-rust, since 
>>> OpenDAL already has python binding: 
>>> https://opendal.apache.org/docs/python/opendal.html
>>>
>>>> Do you have any experience with this? I see many projects having Rust and 
>>>> Python code in a single repository. There are some exceptions like 
>>>> Pydantic ([pydantic](https://github.com/pydantic/pydantic), 
>>>> [pydantic-core](https://github.com/pydantic/pydantic-core)).
>>>
>>> Well, first I want to say providing a python binding for a library written 
>>> in rust is a quite common practice. Just to name a few: 
>>> [opendal](https://github.com/apache/opendal), 
>>> [polars](https://github.com/pola-rs/polars), 
>>> [datafusion](https://github.com/apache/datafusion), 
>>> [delta-rs](https://github.com/delta-io/delta-rs). As far as I know, most of 
>>> them choose to put python binding with rust in the same repo, only 
>>> [datafusion-python](https://github.com/apache/datafusion-python) lives in 
>>> another, I'm not sure about the reason, maybe it's too large?
>>>
>>> I haven't tried to implement one before, but 
>>> [pyo3](https://github.com/PyO3) has great documentation, and there are many 
>>> existing examples in open source we can learn with.
>>>
>>> On Sat, Aug 3, 2024 at 2:23 AM Fokko Driesprong <fo...@apache.org> wrote:
>>>
>>>> One more thing,
>>>>
>>>>> About this idea, would you have a more detailed design? For example, 
>>>>> where should the pyo3 codes live, in iceberg-rust or in pyiceberg? What 
>>>>> kind of interface should we provide to pyiceberg, FileIO or OpenDAL?
>>>>
>>>> Do you have any experience with this? I see many projects having Rust and 
>>>> Python code in a single repository. There are some exceptions like 
>>>> Pydantic ([pydantic](https://github.com/pydantic/pydantic), 
>>>> [pydantic-core](https://github.com/pydantic/pydantic-core)).
>>>>
>>>> Kind regards,
>>>> Fokko
>>>>
>>>> Op vr 2 aug 2024 om 20:11 schreef Fokko Driesprong <fo...@apache.org>:
>>>>
>>>>> Thanks for driving this Xuanwo,
>>>>>
>>>>> I already suggested this in my talk back at the Spark Summit to see if we 
>>>>> can spark some interest, and it is exciting to see this materialize.
>>>>>
>>>>> For the IO abstraction, I think the FileIO is the best option. We already 
>>>>> have the 
>>>>> [interface](https://github.com/apache/iceberg-python/blob/6c0d307032608967ccd00cfe72d8815e6e7e01cc/pyiceberg/io/__init__.py#L239)
>>>>>  in PyIceberg, and also a 
>>>>> [PyArrowFileIO](https://github.com/apache/iceberg-python/blob/6c0d307032608967ccd00cfe72d8815e6e7e01cc/pyiceberg/io/pyarrow.py#L327).
>>>>>  I must admit that the abstraction is less clear in PyIceberg since we 
>>>>> rely so much on Arrow for reading/writing data that it is tightly 
>>>>> coupled. I would love to see if we can use OpenDAL for reading/writing 
>>>>> data, and Iceberg-rust for pushing down the low-level logic. A while ago 
>>>>> I did some profiling on the code, and one of the major issues is that 
>>>>> Arrow doesn't support proper field-ID projection. Therefore we have to 
>>>>> the Parquet file, and do the schema-evolution and type promotion 
>>>>> afterwards [in 
>>>>> Python](https://github.com/apache/iceberg-python/blob/6c0d307032608967ccd00cfe72d8815e6e7e01cc/pyiceberg/io/pyarrow.py#L1444-L1458),
>>>>>  which causes a lot of congestion on the GIL.
>>>>>
>>>>> Kind regards,
>>>>> Fokko
>>>>>
>>>>> Op vr 2 aug 2024 om 17:46 schreef Jack Ye <yezhao...@gmail.com>:
>>>>>
>>>>>> +1 for an OpenDALFileIO
>>>>>>
>>>>>> -Jack
>>>>>>
>>>>>> On Fri, Aug 2, 2024 at 8:32 AM Xuanwo <xua...@apache.org> wrote:
>>>>>>
>>>>>>> Hi, renjie
>>>>>>>
>>>>>>> Thank you for your support. I'll delve into the details and first build 
>>>>>>> a PoC PR to make it clear.
>>>>>>>
>>>>>>> On Fri, Aug 2, 2024, at 22:51, Renjie Liu wrote:
>>>>>>>
>>>>>>>> Hi:
>>>>>>>>
>>>>>>>> Thanks Xuanwo for raising this.
>>>>>>>>
>>>>>>>> As mentioned in another thread, I think using iceberg-rust in 
>>>>>>>> pyiceberg is a good idea.
>>>>>>>>
>>>>>>>> About this idea, would you have a more detailed design? For example, 
>>>>>>>> where should the pyo3 codes live, in iceberg-rust or in pyiceberg? 
>>>>>>>> What kind of interface should we provide to pyiceberg, FileIO or 
>>>>>>>> OpenDAL?
>>>>>>>>
>>>>>>>> I think this is a good first step moving forward to make pyiceberg 
>>>>>>>> backed iceberg-rust. In the future we can replace components gradually.
>>>>>>>>
>>>>>>>> On Fri, Aug 2, 2024 at 5:58 PM Xuanwo <xua...@apache.org> wrote:
>>>>>>>>
>>>>>>>>>> Xuanwo, would PyIceberg and iceberg-rust share the underlying 
>>>>>>>>>> OpenDAL implementations via pyo3 / [fsspec 
>>>>>>>>>> bindings](https://github.com/apache/opendal/issues/4511)?
>>>>>>>>>
>>>>>>>>> Hi, Raschkowski, good question!
>>>>>>>>>
>>>>>>>>> It's possible. There is an ongoing project developing fsspec bindings 
>>>>>>>>> for opendal at https://github.com/fsspec/opendalfs. Once complete, we 
>>>>>>>>> can directly use opendal through fsspec.
>>>>>>>>>
>>>>>>>>> This work is unrelated to Pyicberg or Iceberg-rust. Ideally, users 
>>>>>>>>> should be able to use opendalfs as an alternative implementation of 
>>>>>>>>> the fsspec AbstractFileSystem class.
>>>>>>>>>
>>>>>>>>> On Fri, Aug 2, 2024, at 17:44, Will Raschkowski wrote:
>>>>>>>>>
>>>>>>>>>> Xuanwo, would PyIceberg and iceberg-rust share the underlying 
>>>>>>>>>> OpenDAL implementations via pyo3 / [fsspec 
>>>>>>>>>> bindings](https://github.com/apache/opendal/issues/4511)?
>>>>>>>>>>
>>>>>>>>>> ---------------------------------------------------------------
>>>>>>>>>>
>>>>>>>>>> From: Joe Stein <crypt...@gmail.com>
>>>>>>>>>> Sent: Thursday, August 1, 2024 3:37 AM
>>>>>>>>>> To: dev@iceberg.apache.org <dev@iceberg.apache.org>
>>>>>>>>>> Subject: Re: [DISCUSS] Use iceberg-rust as pyiceberg file io
>>>>>>>>>>
>>>>>>>>>> CAUTION: This email originates from an external party (outside of 
>>>>>>>>>> Palantir). If you believe this message is suspicious in nature, 
>>>>>>>>>> please use the "Report Message" button built into Outlook.
>>>>>>>>>>
>>>>>>>>>> Kafka did this with librdkafka and was wildly successful. The 
>>>>>>>>>> underlying bindings being in rust are great with a layer for access 
>>>>>>>>>> in Python +1
>>>>>>>>>>
>>>>>>>>>> ~ Joe Stein
>>>>>>>>>>
>>>>>>>>>> On Wed, Jul 31, 2024 at 10:29 PM Xuanwo <xua...@apache.org> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hello everyone
>>>>>>>>>>>
>>>>>>>>>>> I start this thread to discuss the idea about using iceberg-rust as 
>>>>>>>>>>> pyiceberg file io.
>>>>>>>>>>>
>>>>>>>>>>> The idea is living at 
>>>>>>>>>>> [https://hackmd.io/@xuanwo/iceberg_rust_as_file_io 
>>>>>>>>>>> [hackmd.io]](https://urldefense.com/v3/__https://hackmd.io/@xuanwo/iceberg_rust_as_file_io__;!!NkS9JGVQ2sDq!7Js41FIzh2smsAOySXrKd527DXCmXdrwV8Uov8TIdQqLRcsCkfPnHzfbxbX_xctpoNpYw2XGfrduTPd6ppTI$)
>>>>>>>>>>>
>>>>>>>>>>> In summary, we can leverage the work from iceberg-rust to help 
>>>>>>>>>>> pyiceberg in developing a fast and compact file IO system that 
>>>>>>>>>>> benefits users with specific constraints.
>>>>>>>>>>>
>>>>>>>>>>> Welcome to join in the discussion.
>>>>>>>>>>>
>>>>>>>>>>> Xuanwo
>>>>>>>>>>>
>>>>>>>>>>> [https://xuanwo.io/ 
>>>>>>>>>>> [xuanwo.io]](https://urldefense.com/v3/__https://xuanwo.io/__;!!NkS9JGVQ2sDq!7Js41FIzh2smsAOySXrKd527DXCmXdrwV8Uov8TIdQqLRcsCkfPnHzfbxbX_xctpoNpYw2XGfrduTNspr1jI$)
>>>>>>>>>
>>>>>>>>> Xuanwo
>>>>>>>>>
>>>>>>>>> https://xuanwo.io/
>>>>>>>
>>>>>>> Xuanwo
>>>>>>>
>>>>>>> https://xuanwo.io/
>>
>> Xuanwo
>>
>> https://xuanwo.io/
>
> Xuanwo
>
> https://xuanwo.io/

Reply via email to