Hey everyone, After an issue on Github <https://github.com/apache/iceberg/issues/8515>, I noticed a bug in PyIceberg that the filesystem isn't being reused <https://github.com/apache/iceberg/pull/8549>. I think there is more room for improvement (both in the long and short term), but I don't think we should block the release on that since 0.5.0 is already much faster due to improved Avro parsing, improved IO, and the previously mentioned bugfix (and one that was merged earlier today <https://github.com/apache/iceberg/pull/8548>).
I'll cut another PR as soon as #8549 is in. Thanks everyone for the patience! Cheers, Fokko Op ma 11 sep 2023 om 14:22 schreef Fokko Driesprong <fo...@apache.org>: > Hi Everyone, > > I propose that we release the following RC as the official PyIceberg 0.5.0 > release. A summary of what's included in 0.5.0: > > - Add gzip metadata support > <https://github.com/apache/iceberg/pull/7984> > - PyArrow HDFS support <https://github.com/apache/iceberg/pull/7997> > - Support serverless environments (AWS Lambda) > <https://github.com/apache/iceberg/pull/8061> > - Many fixes around Avro performance (PRs 1 > <https://github.com/apache/iceberg/pull/8074>, 2 > <https://github.com/apache/iceberg/pull/8075>, 3 > <https://github.com/apache/iceberg/pull/8082>, 4 > <https://github.com/apache/iceberg/pull/8084>) > - Remove the upper bound of PyParsing dependency > <https://github.com/apache/iceberg/pull/8116> (blocking a PR in Airflow > <https://github.com/apache/airflow/pull/32786>) > - Moving the reading of Avro to Cython > <https://github.com/apache/iceberg/pull/8134> (10x speed > improvement(!)) > - Support for the SQLCatalog > <https://github.com/apache/iceberg/pull/7921> (JDBC in Java) > - Fix support for UUID columns > <https://github.com/apache/iceberg/pull/8267> > - Support for adding columns > <https://github.com/apache/iceberg/pull/8174> > - Optimize concurrency <https://github.com/apache/iceberg/pull/8104> > (follow > up on the Support servless environments) > - Bump Pydantic to v2 <https://github.com/apache/iceberg/pull/7782> > (improved > performance of the JSON (de)serialization) > - A lot of bugfixes! > > The commit ID is 3323281045a72f1156d58c261067469e383fb26d > > * This corresponds to the tag: pyiceberg-0.5.0rc2 > (92600935834bdf77ba37ac361338712713549a77) > * https://github.com/apache/iceberg/releases/tag/pyiceberg-0.5.0rc2 > * > https://github.com/apache/iceberg/tree/3323281045a72f1156d58c261067469e383fb26d > > The release tarball, signature, and checksums are here: > > * https://dist.apache.org/repos/dist/dev/iceberg/pyiceberg-0.5.0rc2/ > > You can find the KEYS file here: > > * https://dist.apache.org/repos/dist/dev/iceberg/KEYS > > Convenience binary artifacts are staged on pypi: > > https://pypi.org/project/pyiceberg/0.5.0rc2/ > > And can be installed using: pip3 install pyiceberg==0.5.0rc2 > > Since a lot has changed due to the release of the wheels (binary Python > libraries), I've included the following steps to verify the release: > > curl https://dist.apache.org/repos/dist/dev/iceberg/KEYS -o KEYS > gpg --import KEYS > > svn checkout > https://dist.apache.org/repos/dist/dev/iceberg/pyiceberg-0.5.0rc1/ > /tmp/pyiceberg/ > > for name in $(ls /tmp/pyiceberg/pyiceberg-*.whl > /tmp/pyiceberg/pyiceberg-*.tar.gz) > do > gpg --verify ${name}.asc ${name} > done > > cd /tmp/pyiceberg/ > for name in $(ls /tmp/pyiceberg/pyiceberg-*.whl.asc.sha512 > /tmp/pyiceberg/pyiceberg-*.tar.gz.asc.sha512) > do > shasum -a 512 --check ${name} > done > > tar xzf pyiceberg-0.5.0.tar.gz > cd pyiceberg-0.5.0 > > ./dev/check-license > > Please download, verify, and test. > > Please vote in the next 72 hours. > [ ] +1 Release this as PyIceberg 0.5.0 > [ ] +0 > [ ] -1 Do not release this because... > > Please consider this my +1, I've checked against the docker-spark-iceberg > <https://github.com/tabular-io/docker-spark-iceberg/pull/92> notebook, > and did some checks. > > Kind regards, > Fokko Driesprong > >