Sounds good folks! Thank you for sharing your thoughts. We'll work on getting 
the patch release out, and continue the discussion on upgrading the PyArrow 
version to 17.0.0 in time for 0.8.0 release.

Just adding these two more fixes that were introduced that I think we should 
pull into the patch release. These were added to the GitHub milestone for 
0.7.1, but just cross posting here for awareness:

- Table scan fails when result is empty: 
https://github.com/apache/iceberg-python/pull/997
- Fix RestCatalog ListNamespace to correctly make use of the expected Rest 
Catalog response: https://github.com/apache/iceberg-python/pull/997

Sung

On 2024/08/06 18:29:50 Kevin Liu wrote:
> > Typically we only push patches into the minor versions, we could also go
> to version 0.8.0 immediately.
> 
> The issues above sound like patches to me, fixing issues discovered during
> the 0.7.0 release. Is there a reason to move to 0.8.0?
> 
> > I'm still on the fence regarding 17.0.0 upgrade. There are clear
> functional upsides, but I feel that constraining PyIceberg to just one
> published version would make the adoption of PyIceberg difficult for our
> users.
> 
> +1 on this concern. Is it possible to make the Arrow 17.0.0 upgrade
> optional first? So that folks who want the upgrade can test it out.
> 
> Thanks,
> Kevin Liu
> 
> 
> 
> On Fri, Aug 2, 2024 at 11:33 AM Sung Yun <sun...@apache.org> wrote:
> 
> > Hi Fokko,
> >
> > That makes sense, thank you for the suggestion! The issue was quite severe
> > for us that we had to fork the repo and have a fix ourselves in order to
> > run PyIceberg without our applications going OOM. So I think there will be
> > value in getting the proposed config property out as early as possible for
> > the larger community.
> >
> > I'm still on the fence regarding 17.0.0 upgrade. There are clear
> > functional upsides, but I feel that constraining PyIceberg to just one
> > published version would make the adoption of PyIceberg difficult for our
> > users. Users writing new applications won't have trouble with it, but users
> > intending to use PyIceberg in an existing application may have to upgrade
> > their PyArrow versions which could be a deterrent (or a welcome nudge).
> > Would it be worth starting that discussion on a separate thread?
> >
> > Sung
> >
> > On 2024/08/02 17:57:17 Fokko Driesprong wrote:
> > > Hey Sung,
> > >
> > > Typically we only push patches into the minor versions, we could also go
> > to
> > > version 0.8.0 immediately.
> > >
> > > Regarding the memory consumption, thanks for putting those numbers
> > > together! I would also love to get #929
> > > <https://github.com/apache/iceberg-python/pull/929>, so we can push down
> > > the large/small type to PyArrow (only for to_arrow), and apply #986
> > > <https://github.com/apache/iceberg-python/pull/986> on top if you want
> > to
> > > force it to either small or large types.
> > >
> > > WDYT?
> > >
> > > Kind regards,
> > > Fokko
> > >
> > >
> > > Op vr 2 aug 2024 om 19:46 schreef Sung Yun <sun...@apache.org>:
> > >
> > > > Hi folks,
> > > >
> > > > We identified inefficient memory usage hikes with the current way of
> > > > upcasting pyarrow types to large_<type> on read, when reading tables
> > with
> > > > certain characteristics. A detailed set of example benchmarks of this
> > issue
> > > > is on the google document linked on PR #986:
> > > > https://github.com/apache/iceberg-python/pull/986
> > > >
> > > > The proposed solution introduces a config to override this behavior to
> > use
> > > > small types instead, and I'd like to add this into the patch release to
> > > > give users better control over their memory usage.
> > > >
> > > > Also, this is just a gentle reminder that this DISCUSS thread is still
> > > > open for any new issues that are identified from 0.7.0 release, that we
> > > > should fix in the patch release.
> > > >
> > > > Thank you,
> > > > Sung
> > > >
> > > > On 2024/07/30 23:57:04 Sung Yun wrote:
> > > > > Hi folks,
> > > > >
> > > > > We are starting to compile the list of issues to fix and port into
> > the
> > > > > 0.7.1 release.
> > > > >
> > > > > The current list of known issues is as follows:
> > > > >
> > > > > Fix pydantic warning on table commit: #972
> > > > > <https://github.com/apache/iceberg-python/pull/972> (thanks for the
> > > > quick
> > > > > fix ndrluis!)
> > > > > Issue when rewriting an unpartitioned table: #979
> > > > > <https://github.com/apache/iceberg-python/issues/979>
> > > > > Issue when evolving and writing in the same transaction: #980
> > > > > <https://github.com/apache/iceberg-python/issues/980>
> > > > >
> > > > > Please feel free to respond to this thread with any issues that
> > should be
> > > > > tracked for the patch release.
> > > > >
> > > > > Thank you!
> > > > > Sung
> > > > >
> > > >
> > >
> >
> 

Reply via email to