I'm +1 for getting 0.7.1 out fast with patches. And target Arrow 17.0.0
upgrade as part of the 0.8.0 release.

Thanks,
Kevin Liu

On Tue, Aug 6, 2024 at 11:43 AM Fokko Driesprong <fo...@apache.org> wrote:

> The issues above sound like patches to me, fixing issues discovered during
>> the 0.7.0 release. Is there a reason to move to 0.8.0?
>>
>
> That would also allow us to add new features :) I'm also okay with a 0.7.1
> release
>
> +1 on this concern. Is it possible to make the Arrow 17.0.0 upgrade
>> optional first? So that folks who want the upgrade can test it out.
>
> If we go with a 0.7.1. How about targeting Arrow 17.0.0 to PyIceberg 0.8.0?
>
> Kind regards,
> Fokko
>
>
>
>
>
>
> Op di 6 aug 2024 om 20:30 schreef Kevin Liu <kevin.jq....@gmail.com>:
>
>> > Typically we only push patches into the minor versions, we could also
>> go to version 0.8.0 immediately.
>>
>> The issues above sound like patches to me, fixing issues discovered
>> during the 0.7.0 release. Is there a reason to move to 0.8.0?
>>
>> > I'm still on the fence regarding 17.0.0 upgrade. There are clear
>> functional upsides, but I feel that constraining PyIceberg to just one
>> published version would make the adoption of PyIceberg difficult for our
>> users.
>>
>> +1 on this concern. Is it possible to make the Arrow 17.0.0 upgrade
>> optional first? So that folks who want the upgrade can test it out.
>>
>> Thanks,
>> Kevin Liu
>>
>>
>>
>> On Fri, Aug 2, 2024 at 11:33 AM Sung Yun <sun...@apache.org> wrote:
>>
>>> Hi Fokko,
>>>
>>> That makes sense, thank you for the suggestion! The issue was quite
>>> severe for us that we had to fork the repo and have a fix ourselves in
>>> order to run PyIceberg without our applications going OOM. So I think there
>>> will be value in getting the proposed config property out as early as
>>> possible for the larger community.
>>>
>>> I'm still on the fence regarding 17.0.0 upgrade. There are clear
>>> functional upsides, but I feel that constraining PyIceberg to just one
>>> published version would make the adoption of PyIceberg difficult for our
>>> users. Users writing new applications won't have trouble with it, but users
>>> intending to use PyIceberg in an existing application may have to upgrade
>>> their PyArrow versions which could be a deterrent (or a welcome nudge).
>>> Would it be worth starting that discussion on a separate thread?
>>>
>>> Sung
>>>
>>> On 2024/08/02 17:57:17 Fokko Driesprong wrote:
>>> > Hey Sung,
>>> >
>>> > Typically we only push patches into the minor versions, we could also
>>> go to
>>> > version 0.8.0 immediately.
>>> >
>>> > Regarding the memory consumption, thanks for putting those numbers
>>> > together! I would also love to get #929
>>> > <https://github.com/apache/iceberg-python/pull/929>, so we can push
>>> down
>>> > the large/small type to PyArrow (only for to_arrow), and apply #986
>>> > <https://github.com/apache/iceberg-python/pull/986> on top if you
>>> want to
>>> > force it to either small or large types.
>>> >
>>> > WDYT?
>>> >
>>> > Kind regards,
>>> > Fokko
>>> >
>>> >
>>> > Op vr 2 aug 2024 om 19:46 schreef Sung Yun <sun...@apache.org>:
>>> >
>>> > > Hi folks,
>>> > >
>>> > > We identified inefficient memory usage hikes with the current way of
>>> > > upcasting pyarrow types to large_<type> on read, when reading tables
>>> with
>>> > > certain characteristics. A detailed set of example benchmarks of
>>> this issue
>>> > > is on the google document linked on PR #986:
>>> > > https://github.com/apache/iceberg-python/pull/986
>>> > >
>>> > > The proposed solution introduces a config to override this behavior
>>> to use
>>> > > small types instead, and I'd like to add this into the patch release
>>> to
>>> > > give users better control over their memory usage.
>>> > >
>>> > > Also, this is just a gentle reminder that this DISCUSS thread is
>>> still
>>> > > open for any new issues that are identified from 0.7.0 release, that
>>> we
>>> > > should fix in the patch release.
>>> > >
>>> > > Thank you,
>>> > > Sung
>>> > >
>>> > > On 2024/07/30 23:57:04 Sung Yun wrote:
>>> > > > Hi folks,
>>> > > >
>>> > > > We are starting to compile the list of issues to fix and port into
>>> the
>>> > > > 0.7.1 release.
>>> > > >
>>> > > > The current list of known issues is as follows:
>>> > > >
>>> > > > Fix pydantic warning on table commit: #972
>>> > > > <https://github.com/apache/iceberg-python/pull/972> (thanks for
>>> the
>>> > > quick
>>> > > > fix ndrluis!)
>>> > > > Issue when rewriting an unpartitioned table: #979
>>> > > > <https://github.com/apache/iceberg-python/issues/979>
>>> > > > Issue when evolving and writing in the same transaction: #980
>>> > > > <https://github.com/apache/iceberg-python/issues/980>
>>> > > >
>>> > > > Please feel free to respond to this thread with any issues that
>>> should be
>>> > > > tracked for the patch release.
>>> > > >
>>> > > > Thank you!
>>> > > > Sung
>>> > > >
>>> > >
>>> >
>>>
>>

Reply via email to