One more thing,

About this idea, would you have a more detailed design? For example,  where
> should the pyo3 codes live, in iceberg-rust or in pyiceberg? What kind of
> interface should we provide to pyiceberg, FileIO or OpenDAL?


Do you have any experience with this? I see many projects having Rust and
Python code in a single repository. There are some exceptions like
Pydantic (pydantic <https://github.com/pydantic/pydantic>, pydantic-core
<https://github.com/pydantic/pydantic-core>).

Kind regards,
Fokko



Op vr 2 aug 2024 om 20:11 schreef Fokko Driesprong <fo...@apache.org>:

> Thanks for driving this Xuanwo,
>
> I already suggested this in my talk back at the Spark Summit to see if we
> can spark some interest, and it is exciting to see this materialize.
>
> For the IO abstraction, I think the FileIO is the best option. We already
> have the interface
> <https://github.com/apache/iceberg-python/blob/6c0d307032608967ccd00cfe72d8815e6e7e01cc/pyiceberg/io/__init__.py#L239>
> in PyIceberg, and also a PyArrowFileIO
> <https://github.com/apache/iceberg-python/blob/6c0d307032608967ccd00cfe72d8815e6e7e01cc/pyiceberg/io/pyarrow.py#L327>.
> I must admit that the abstraction is less clear in PyIceberg since we rely
> so much on Arrow for reading/writing data that it is tightly coupled. I
> would love to see if we can use OpenDAL for reading/writing data, and
> Iceberg-rust for pushing down the low-level logic. A while ago I did some
> profiling on the code, and one of the major issues is that Arrow doesn't
> support proper field-ID projection. Therefore we have to the Parquet file,
> and do the schema-evolution and type promotion afterwards in Python
> <https://github.com/apache/iceberg-python/blob/6c0d307032608967ccd00cfe72d8815e6e7e01cc/pyiceberg/io/pyarrow.py#L1444-L1458>,
> which causes a lot of congestion on the GIL.
>
> Kind regards,
> Fokko
>
> Op vr 2 aug 2024 om 17:46 schreef Jack Ye <yezhao...@gmail.com>:
>
>> +1 for an OpenDALFileIO
>>
>> -Jack
>>
>> On Fri, Aug 2, 2024 at 8:32 AM Xuanwo <xua...@apache.org> wrote:
>>
>>> Hi, renjie
>>>
>>> Thank you for your support. I'll delve into the details and first build
>>> a PoC PR to make it clear.
>>>
>>> On Fri, Aug 2, 2024, at 22:51, Renjie Liu wrote:
>>>
>>> Hi:
>>>
>>> Thanks Xuanwo for raising this.
>>>
>>> As mentioned in another thread, I think using iceberg-rust in pyiceberg
>>> is a good idea.
>>>
>>> About this idea, would you have a more detailed design? For example,
>>> where should the pyo3 codes live, in iceberg-rust or in pyiceberg? What
>>> kind of interface should we provide to pyiceberg, FileIO or OpenDAL?
>>>
>>> I think this is a good first step moving forward to make pyiceberg
>>> backed iceberg-rust. In the future we can replace components gradually.
>>>
>>> On Fri, Aug 2, 2024 at 5:58 PM Xuanwo <xua...@apache.org> wrote:
>>>
>>>
>>> > Xuanwo, would PyIceberg and iceberg-rust share the underlying OpenDAL
>>> implementations via pyo3 / fsspec bindings
>>> <https://github.com/apache/opendal/issues/4511>?
>>>
>>> Hi, Raschkowski, good question!
>>>
>>> It's possible. There is an ongoing project developing fsspec bindings
>>> for opendal at https://github.com/fsspec/opendalfs. Once complete, we
>>> can directly use opendal through fsspec.
>>>
>>> This work is unrelated to Pyicberg or Iceberg-rust. Ideally, users
>>> should be able to use opendalfs as an alternative implementation of the
>>> fsspec AbstractFileSystem class.
>>>
>>> On Fri, Aug 2, 2024, at 17:44, Will Raschkowski wrote:
>>>
>>> Xuanwo, would PyIceberg and iceberg-rust share the underlying OpenDAL
>>> implementations via pyo3 / fsspec bindings
>>> <https://github.com/apache/opendal/issues/4511>?
>>>
>>>
>>> ------------------------------
>>>
>>> *From:* Joe Stein <crypt...@gmail.com>
>>> *Sent:* Thursday, August 1, 2024 3:37 AM
>>> *To:* dev@iceberg.apache.org <dev@iceberg.apache.org>
>>> *Subject:* Re: [DISCUSS] Use iceberg-rust as pyiceberg file io
>>>
>>> *CAUTION:* This email originates from an external party (outside of
>>> Palantir). If you believe this message is suspicious in nature, please use
>>> the "Report Message" button built into Outlook.
>>>
>>> Kafka did this with librdkafka and was wildly successful. The underlying
>>> bindings being in rust are great with a layer for access in Python +1
>>>
>>>
>>> ~ Joe Stein
>>>
>>>
>>> On Wed, Jul 31, 2024 at 10:29 PM Xuanwo <xua...@apache.org> wrote:
>>>
>>> Hello everyone
>>>
>>> I start this thread to discuss the idea about using iceberg-rust as
>>> pyiceberg file io.
>>>
>>> The idea is living at https://hackmd.io/@xuanwo/iceberg_rust_as_file_io
>>> [hackmd.io]
>>> <https://urldefense.com/v3/__https://hackmd.io/@xuanwo/iceberg_rust_as_file_io__;!!NkS9JGVQ2sDq!7Js41FIzh2smsAOySXrKd527DXCmXdrwV8Uov8TIdQqLRcsCkfPnHzfbxbX_xctpoNpYw2XGfrduTPd6ppTI$>
>>>
>>> In summary, we can leverage the work from iceberg-rust to help pyiceberg
>>> in developing a fast and compact file IO system that benefits users with
>>> specific constraints.
>>>
>>> Welcome to join in the discussion.
>>>
>>> Xuanwo
>>>
>>> https://xuanwo.io/ [xuanwo.io]
>>> <https://urldefense.com/v3/__https://xuanwo.io/__;!!NkS9JGVQ2sDq!7Js41FIzh2smsAOySXrKd527DXCmXdrwV8Uov8TIdQqLRcsCkfPnHzfbxbX_xctpoNpYw2XGfrduTNspr1jI$>
>>>
>>> Xuanwo
>>>
>>> https://xuanwo.io/
>>>
>>> Xuanwo
>>>
>>> https://xuanwo.io/
>>>
>>>

Reply via email to