Hi folks, While testing out the Rest Catalog Adapter docker image that Ajantha has been working working on, I ran into an issue when parsing the TableResponse of a staged table. While the metadata-location is an optional field according to the Iceberg Rest Catalog Spec, the field is being handled as a required field in PyIceberg, due to peculiarities in how the pydantic model needs to be defined in order to allow for the field to truly be optional in the provided response.
The implication of not fixing this, would be that PyIceberg 0.8.0 would not be able to support staged table transactions against REST Catalog Servers that omits the `metadata-location` field in the TableResponse. @ Kevin - what are your thoughts on cutting out a second RC that includes this fix? Here's the PR to resolve this issue, that explains the issue in more detail: https://github.com/apache/iceberg-python/pull/1321 On 2024/11/13 13:11:46 Jean-Baptiste Onofré wrote: > +1 (non binding) > > I checked: > - Signature and hash are OK > - ASF header present > - LICENSE and NOTICE look good > > Thanks ! > Regards > JB > > On Thu, Nov 7, 2024 at 10:57 PM Kevin Liu <kevin.jq....@gmail.com> wrote: > > > > Hi Everyone, > > > > I propose that we release the following RC as the official PyIceberg 0.8.0 > > release. > > > > The commit ID is 0eaadb9 > > > > This corresponds to the tag: pyiceberg-0.8.0rc1 > > (ac00f5354c2c12ed8f465295a3a626e0db9c1689) > > https://github.com/apache/iceberg-python/releases/tag/pyiceberg-0.8.0rc1 > > https://github.com/apache/iceberg-python/tree/0eaadb9e61c7c9373eddaafd723c3be9fd66ab42 > > > > The release tarball, signature, and checksums are here: > > > > https://dist.apache.org/repos/dist/dev/iceberg/pyiceberg-0.8.0rc1/ > > > > You can find the KEYS file here: > > > > https://dist.apache.org/repos/dist/dev/iceberg/KEYS > > > > Convenience binary artifacts are staged on pypi: > > > > https://pypi.org/project/pyiceberg/0.8.0rc1/ > > > > And can be installed using: pip3 install pyiceberg==0.8.0rc1 > > > > Instructions for verifying a release can be found here: > > > > https://py.iceberg.apache.org/verify-release/ > > > > Please download, verify, and test. > > > > High-level Summary > > > > 176 new commits > > 18 new first-time contributors > > Deprecation Notice > > > > Deprecated configuration properties: profile_name, region_name, > > aws_access_key_id, aws_secret_access_key, and aws_session_token > > Deprecated functions: to_requested_schema in pyiceberg/io/pyarrow.py and > > add_snapshot and set_ref_snapshot in pyiceberg/table/__init__.py > > > > Find a detailed list of PRs at > > https://github.com/apache/iceberg-python/releases/tag/pyiceberg-0.8.0rc1 > > Highlights > > > > Documentation improvements > > > > Improve docstrings, configuration, etc > > Improve the release process; updated “How to Release” and “Verify Release” > > documentation > > > > General > > > > Add support for Python 3.12; drop support for Python 3.8; exclude Python > > 3.9.7 > > Bump PyArrow to 18.0.0, remove numpy as a hard dependency > > Bump up Iceberg version to 1.6.0 in integration tests > > > > Features > > > > Add metadata tables for data_files and delete_files > > Add list_views and drop_view to Rest catalog > > Add partition MonthTransform > > Support manifest file caching > > Support Hive Metastore High Availability mode > > Add properties to allow configuring small/large pyarrow type on read > > Deprecate redundant catalog identifiers in TableIdentifier and row_filter > > expressions > > Update metadata-log for non-rest catalogs > > Add support for boolean expressions and quoted columns in row_filter > > expressions > > Support setting ARN Role and Session name in S3 and Glue > > Support bi-directional union of types (int <> long, float <> double) > > Support passing table-token to commit endpoint > > Allow setting write.parquet.row-group-limit and write.parquet.page-row-limit > > Deprecate rest.authorization-url in favor of oauth2-server-uri > > Support s3.signer.endpoint > > Add support to configure access delegation header, > > X-Iceberg-Access-Delegation > > Remove initial_change usage in TableUpdates > > Prevent adding duplicate files in the add_files API > > Support fields with . in name > > > > Bug Fix > > > > Abort the whole table transaction if any updates in the transaction have > > failed > > Use appropriate partition spec for delete > > Use self.table_metadata when in transaction > > Accept empty arrays in struct field lookup > > List namespace response in rest catalog with fully qualified namespace > > list_tables method in glue catalog now only returns tables, instead of > > views+tables > > Glue and Hive catalog return only Iceberg tables, instead of hive+iceberg > > tables > > Invert case_sensitive logic in StructType > > Fix table_exists behavior in the REST catalog > > Fix bug where reading with to_arrow_batch_reader return more than the limit > > PyArrow: Pass in null-mask for StructField > > Fix overwrite when filtering all the data > > Use the correct spec when rewriting existing manifests > > Use historical partition field name > > Fix Position Deletes + row_filter yields less data when the DataFile is > > large > > Allow for missing operation in Snapshot metadata > > Fix tracing existing entries when there are deletes > > Handle Empty RecordBatch within _task_to_record_batches > > > > > > Please vote in the next 72 hours. > > [ ] +1 Release this as PyIceberg 0.8.0 > > [ ] +0 > > > > [ ] -1 Do not release this because... > > > > > > > > Best, > > > > Kevin Liu >