The feature sounds reasonable to me, if a schema or partition spec is no
longer referenced and used for any time travel purpose, then it seems to me
that it could be safely pruned through some utility actions. If schema
changes frequently and there are many columns it might be helpful in
reducing metadata size.

+1 for reopening the issue to discuss further. We should probably also make
the title more specific than "The metadata file is too large".

Best,
Jack Ye

On Tue, Feb 20, 2024 at 9:55 AM Sung Yun (BLOOMBERG/ 120 PARK) <
syu...@bloomberg.net> wrote:

> Hi Barron, we've noticed the same issue as well since this PR was merged
> in to introduce schema versions:
> https://github.com/apache/iceberg/pull/2096
>
> There's a closed issue where folks were discussions options in remediating
> this problem, that also has links to other related PRs and Issues:
> https://github.com/apache/iceberg/issues/5219. Should we reopen this
> issue and converge our discussion points on the main discussion thread?
>
> Sung
>
> From: barron....@twosigma.com At: 02/20/24 12:34:57 UTC-5:00
> To: dev@iceberg.apache.org
> Subject: Table Schema History Pruning
>
> Hi folks,
>
> I have a few questions regarding the schema history of an Iceberg table.
>
> The table metadata file keeps track of every table schema version (at
> least in
> v2). Depending on the size of the schema, this history can become large in
> terms of byte size.
>
>
> 1. Is removing a schema from the history correct when the table does not
> have a snapshot referencing that schema version?
> 2. Does the Iceberg spec guarantee that the history is not pruned?
> 3. Are there any plans for Iceberg to support pruning the schema history?
>
> Thanks,
> Barron Wei
>
>
>

Reply via email to