Let's rock! Welcome to take a review: 
https://github.com/apache/iceberg-rust/pull/518

On Sat, Aug 3, 2024, at 12:13, Xuanwo wrote:
> I also support integrating iceberg-rust with pyiceberg rather than building 
> something new on OpenDAL.
> 
> OpenDAL backed FileIO will be usable in Python once opendalfs[1], the native 
> fsspec support for OpenDAL, is ready. Users can use opendalfs as a FileIO 
> class directly in pure python. It's not an action item for our community to 
> take.
> 
> The consensus we've reached is that iceberg-rust will be the core of 
> PyIceberg. The main question now is "How?" How can we implement it without 
> disrupting our valued users? This is my top priority.
> 
> *Naming is so hard! Let's refer to the new iceberg-rust based pyiceberg core 
> as `*pyiceberg-core*` until we decide on a project name.*
> 
> First, we need to establish a workflow that allows us to gradually integrate 
> new features into pyiceberg-core. Additionally, pyiceberg should be able to 
> import and optionally use classes from pyiceberg-core in an additive manner. 
> While developing this workflow, our community will learn how to collaborate, 
> manage releases, and more.
> 
> We will then incorporate additional Rust-backed features into pyiceberg-core. 
> Eventually, we may make pyiceberg-core our default implementation.
> 
> My current plan is to implement this pyiceberg-core under iceberg-rust repo 
> under `bindings/python`.
> 
> - Iceberg-rust is currently under active development. I plan to release 
> pyiceberg-core independently of iceberg-rust's release, as they feature 
> distinct public APIs (and languages!).
> - Most of the work involves maintaining a few Python stubs and classes, with 
> the majority related to Rust.
> - The python integration is just a start: we can expect `bindings/nodejs` to 
> happen here too.
> 
> The setup work has already been started. I will update my PR here once it's 
> ready to review.
> 
> [1]: https://github.com/fsspec/opendalfs
> 
> On Sat, Aug 3, 2024, at 09:57, Renjie Liu wrote:
>> Hi:
>> 
>> I lean towards implementing pyiceberg's FileIO backed by iceberg-rust's 
>> FileIO, rather than directly using OpenDAL. The motivation is that we can 
>> use this as a starting point of providing iceberg-rust backed components for 
>> pyiceberg, and due to its simplicity, it's a good case. I believe there will 
>> be more cases, like Sung mentioned transform in another thread, and table 
>> scan mentioned by Fokko.
>> 
>> If we want to use OpenDAL directly, we don't need iceberg-rust, since 
>> OpenDAL already has python binding: 
>> https://opendal.apache.org/docs/python/opendal.html
>> 
>>> Do you have any experience with this? I see many projects having Rust and 
>>> Python code in a single repository. There are some exceptions like Pydantic 
>>> (pydantic <https://github.com/pydantic/pydantic>, pydantic-core 
>>> <https://github.com/pydantic/pydantic-core>).
>> 
>> Well, first I want to say providing a python binding for a library written 
>> in rust is a quite common practice. Just to name a few: opendal 
>> <https://github.com/apache/opendal>,  polars 
>> <https://github.com/pola-rs/polars>, datafusion 
>> <https://github.com/apache/datafusion>, delta-rs 
>> <https://github.com/delta-io/delta-rs>. As far as I know, most of them 
>> choose to put python binding with rust in the same repo, only 
>> datafusion-python <https://github.com/apache/datafusion-python> lives in 
>> another, I'm not sure about the reason, maybe it's too large? 
>> 
>> I haven't tried to implement one before, but pyo3 <https://github.com/PyO3> 
>> has great documentation, and there are many existing examples in open source 
>> we can learn with.
>> 
>> On Sat, Aug 3, 2024 at 2:23 AM Fokko Driesprong <fo...@apache.org> wrote:
>>> One more thing,
>>> 
>>>> About this idea, would you have a more detailed design? For example,  
>>>> where should the pyo3 codes live, in iceberg-rust or in pyiceberg? What 
>>>> kind of interface should we provide to pyiceberg, FileIO or OpenDAL?
>>> 
>>> Do you have any experience with this? I see many projects having Rust and 
>>> Python code in a single repository. There are some exceptions like Pydantic 
>>> (pydantic <https://github.com/pydantic/pydantic>, pydantic-core 
>>> <https://github.com/pydantic/pydantic-core>).
>>> 
>>> Kind regards,
>>> Fokko
>>> 
>>>  
>>> 
>>> Op vr 2 aug 2024 om 20:11 schreef Fokko Driesprong <fo...@apache.org>:
>>>> Thanks for driving this Xuanwo,
>>>> 
>>>> I already suggested this in my talk back at the Spark Summit to see if we 
>>>> can spark some interest, and it is exciting to see this materialize.
>>>> 
>>>> For the IO abstraction, I think the FileIO is the best option. We already 
>>>> have the interface 
>>>> <https://github.com/apache/iceberg-python/blob/6c0d307032608967ccd00cfe72d8815e6e7e01cc/pyiceberg/io/__init__.py#L239>
>>>>  in PyIceberg, and also a PyArrowFileIO 
>>>> <https://github.com/apache/iceberg-python/blob/6c0d307032608967ccd00cfe72d8815e6e7e01cc/pyiceberg/io/pyarrow.py#L327>.
>>>>  I must admit that the abstraction is less clear in PyIceberg since we 
>>>> rely so much on Arrow for reading/writing data that it is tightly coupled. 
>>>> I would love to see if we can use OpenDAL for reading/writing data, and 
>>>> Iceberg-rust for pushing down the low-level logic. A while ago I did some 
>>>> profiling on the code, and one of the major issues is that Arrow doesn't 
>>>> support proper field-ID projection. Therefore we have to the Parquet file, 
>>>> and do the schema-evolution and type promotion afterwards in Python 
>>>> <https://github.com/apache/iceberg-python/blob/6c0d307032608967ccd00cfe72d8815e6e7e01cc/pyiceberg/io/pyarrow.py#L1444-L1458>,
>>>>  which causes a lot of congestion on the GIL.
>>>> 
>>>> Kind regards,
>>>> Fokko
>>>> 
>>>> Op vr 2 aug 2024 om 17:46 schreef Jack Ye <yezhao...@gmail.com>:
>>>>> +1 for an OpenDALFileIO
>>>>> 
>>>>> -Jack
>>>>> 
>>>>> On Fri, Aug 2, 2024 at 8:32 AM Xuanwo <xua...@apache.org> wrote:
>>>>>> __
>>>>>> Hi, renjie
>>>>>> 
>>>>>> Thank you for your support. I'll delve into the details and first build 
>>>>>> a PoC PR to make it clear.
>>>>>> 
>>>>>> On Fri, Aug 2, 2024, at 22:51, Renjie Liu wrote:
>>>>>>> Hi:
>>>>>>> 
>>>>>>> Thanks Xuanwo for raising this.
>>>>>>> 
>>>>>>> As mentioned in another thread, I think using iceberg-rust in pyiceberg 
>>>>>>> is a good idea.
>>>>>>> 
>>>>>>> About this idea, would you have a more detailed design? For example,  
>>>>>>> where should the pyo3 codes live, in iceberg-rust or in pyiceberg? What 
>>>>>>> kind of interface should we provide to pyiceberg, FileIO or OpenDAL?
>>>>>>> 
>>>>>>> I think this is a good first step moving forward to make pyiceberg 
>>>>>>> backed iceberg-rust. In the future we can replace components gradually.
>>>>>>> 
>>>>>>> On Fri, Aug 2, 2024 at 5:58 PM Xuanwo <xua...@apache.org> wrote:
>>>>>>>> __
>>>>>>>> > Xuanwo, would PyIceberg and iceberg-rust share the underlying 
>>>>>>>> > OpenDAL implementations via pyo3 / fsspec bindings 
>>>>>>>> > <https://github.com/apache/opendal/issues/4511>?
>>>>>>>> 
>>>>>>>> Hi, Raschkowski, good question!
>>>>>>>> 
>>>>>>>> It's possible. There is an ongoing project developing fsspec bindings 
>>>>>>>> for opendal at https://github.com/fsspec/opendalfs. Once complete, we 
>>>>>>>> can directly use opendal through fsspec.
>>>>>>>> 
>>>>>>>> This work is unrelated to Pyicberg or Iceberg-rust. Ideally, users 
>>>>>>>> should be able to use opendalfs as an alternative implementation of 
>>>>>>>> the fsspec AbstractFileSystem class.
>>>>>>>> 
>>>>>>>> On Fri, Aug 2, 2024, at 17:44, Will Raschkowski wrote:
>>>>>>>>> Xuanwo, would PyIceberg and iceberg-rust share the underlying OpenDAL 
>>>>>>>>> implementations via pyo3 / fsspec bindings 
>>>>>>>>> <https://github.com/apache/opendal/issues/4511>?
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> *From:* Joe Stein <crypt...@gmail.com>
>>>>>>>>> *Sent:* Thursday, August 1, 2024 3:37 AM
>>>>>>>>> *To:* dev@iceberg.apache.org <dev@iceberg.apache.org>
>>>>>>>>> *Subject:* Re: [DISCUSS] Use iceberg-rust as pyiceberg file io
>>>>>>>>>  
>>>>>>>>> *CAUTION:* This email originates from an external party (outside of 
>>>>>>>>> Palantir). If you believe this message is suspicious in nature, 
>>>>>>>>> please use the "Report Message" button built into Outlook.
>>>>>>>>> 
>>>>>>>>> Kafka did this with librdkafka and was wildly successful. The 
>>>>>>>>> underlying bindings being in rust are great with a layer for access 
>>>>>>>>> in Python +1
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> ~ Joe Stein
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Wed, Jul 31, 2024 at 10:29 PM Xuanwo <xua...@apache.org> wrote:
>>>>>>>>>> Hello everyone
>>>>>>>>>> 
>>>>>>>>>> I start this thread to discuss the idea about using iceberg-rust as 
>>>>>>>>>> pyiceberg file io.
>>>>>>>>>> 
>>>>>>>>>> The idea is living at 
>>>>>>>>>> https://hackmd.io/@xuanwo/iceberg_rust_as_file_io [hackmd.io] 
>>>>>>>>>> <https://urldefense.com/v3/__https://hackmd.io/@xuanwo/iceberg_rust_as_file_io__;!!NkS9JGVQ2sDq!7Js41FIzh2smsAOySXrKd527DXCmXdrwV8Uov8TIdQqLRcsCkfPnHzfbxbX_xctpoNpYw2XGfrduTPd6ppTI$>
>>>>>>>>>> 
>>>>>>>>>> In summary, we can leverage the work from iceberg-rust to help 
>>>>>>>>>> pyiceberg in developing a fast and compact file IO system that 
>>>>>>>>>> benefits users with specific constraints.
>>>>>>>>>> 
>>>>>>>>>> Welcome to join in the discussion.
>>>>>>>>>> 
>>>>>>>>>> Xuanwo
>>>>>>>>>> 
>>>>>>>>>> https://xuanwo.io/ [xuanwo.io] 
>>>>>>>>>> <https://urldefense.com/v3/__https://xuanwo.io/__;!!NkS9JGVQ2sDq!7Js41FIzh2smsAOySXrKd527DXCmXdrwV8Uov8TIdQqLRcsCkfPnHzfbxbX_xctpoNpYw2XGfrduTNspr1jI$>
>>>>>>>> Xuanwo
>>>>>>>> 
>>>>>>>> https://xuanwo.io/
>>>>>>>> 
>>>>>> Xuanwo
>>>>>> 
>>>>>> https://xuanwo.io/
>>>>>> 
> Xuanwo
> 
> https://xuanwo.io/
> 
Xuanwo

https://xuanwo.io/

Reply via email to