Hi Everyone, I propose that we release the following RC as the official PyIceberg 0.8.0 release.
The commit ID is 3ccdc44735d70bd3ef6ed18b60b3eba43c4b3b44 <https://github.com/apache/iceberg-python/commit/3ccdc44735d70bd3ef6ed18b60b3eba43c4b3b44> - This corresponds to the tag: pyiceberg-0.8.0rc2 (4a7abd0478996547ee68a5ee1847130bc0a45c10) - https://github.com/apache/iceberg-python/releases/tag/pyiceberg-0.8.0rc2 - https://github.com/apache/iceberg-python/tree/3ccdc44735d70bd3ef6ed18b60b3eba43c4b3b44 The release tarball, signature, and checksums are here: - https://dist.apache.org/repos/dist/dev/iceberg/pyiceberg-0.8.0rc2/ You can find the KEYS file here: - https://downloads.apache.org/iceberg/KEYS Convenience binary artifacts are staged on pypi: https://pypi.org/project/pyiceberg/0.8.0rc2/ And can be installed using: pip3 install pyiceberg==0.8.0rc2 Instructions for verifying a release can be found here: - https://py.iceberg.apache.org/verify-release/ Please download, verify, and test. High-level Summary - 185 <https://github.com/apache/iceberg-python/compare/pyiceberg-0.7.1...pyiceberg-0.8.0rc2> new commits - 18 new first-time contributors - Deprecation Notice - Deprecated configuration properties: profile_name, region_name, aws_access_key_id, aws_secret_access_key, and aws_session_token - Deprecated functions: to_requested_schema in pyiceberg/io/pyarrow.py and add_snapshot and set_ref_snapshot in pyiceberg/table/__init__.py - Find a detailed list of PRs at https://github.com/apache/iceberg-python/releases/tag/pyiceberg-0.8.0rc2 - Highlights - Documentation improvements - Improve docstrings, configuration, etc - Improve the release process; updated “How to Release” and “Verify Release” documentation - General - Add support for Python 3.12; drop support for Python 3.8; exclude Python 3.9.7 - Bump PyArrow to 18.0.0, remove numpy as a hard dependency - Bump up Iceberg version to 1.6.0 in integration tests - Updated release and verify release to use KEYS from apache’s `dist/release` repo - Features - Add metadata tables for data_files and delete_files - Add list_views and drop_view to Rest catalog - Add partition MonthTransform - Support manifest file caching - Support Hive Metastore High Availability mode - Add properties to allow configuring small/large pyarrow type on read - Deprecate redundant catalog identifiers in TableIdentifier and row_filter expressions - Update metadata-log for non-rest catalogs - Add support for boolean expressions and quoted columns in row_filter expressions - Support setting ARN Role and Session name in S3 and Glue - Support bi-directional union of types (int <> long, float <> double) - Support passing table-token to commit endpoint - Allow setting write.parquet.row-group-limit and write.parquet.page-row-limit - Deprecate rest.authorization-url in favor of oauth2-server-uri - Support s3.signer.endpoint - Add support to configure access delegation header, X-Iceberg-Access-Delegation - Remove initial_change usage in TableUpdates - Prevent adding duplicate files in the add_files API - Support fields with . in name - Bug Fix - TableResponse metadata_location can be optional - Abort the whole table transaction if any updates in the transaction have failed - Use appropriate partition spec for delete - Use self.table_metadata when in transaction - Accept empty arrays in struct field lookup - List namespace response in rest catalog with fully qualified namespace - list_tables method in glue catalog now only returns tables, instead of views+tables - Glue and Hive catalog return only Iceberg tables, instead of hive+iceberg tables - Invert case_sensitive logic in StructType - Fix table_exists behavior in the REST catalog - Fix bug where reading with to_arrow_batch_reader return more than the limit - PyArrow: Pass in null-mask for StructField - Fix overwrite when filtering all the data - Use the correct spec when rewriting existing manifests - Use historical partition field name - Fix Position Deletes + row_filter yields less data when the DataFile is large - Allow for missing operation in Snapshot metadata - Fix tracing existing entries when there are deletes - Handle Empty RecordBatch within _task_to_record_batches Please vote in the next 72 hours. [ ] +1 Release this as PyIceberg 0.8.0 [ ] +0 [ ] -1 Do not release this because... Best, Kevin Liu