Hey Everyone, We put in a feature request / proposal on this topic a few days 
ago, with the idea of storing the schemas in files that are external to 
metadata.json https://github.com/apache/iceberg/issues/9734 - would be really 
interested in getting some feedback on it and seeing if folks think it's a 
viable solution!


On Feb 20, 2024, at 4:53 PM, Russell Spitzer <russell.spit...@gmail.com> wrote:


I believe I actually wrote a PR to do some of this a long time ago, I 
specifically wrote a tool for reducing partition specs,
https://github.com/apache/iceberg/pull/3462
API: Add function for removing Specs from Metadata.json which are no … by 
RussellSpitzer · Pull Request #3462 · apache/iceberg
github.com


> On Feb 20, 2024, at 3:26 PM, Jack Ye <yezhao...@gmail.com> wrote:
> 
> The feature sounds reasonable to me, if a schema or partition spec is no 
longer referenced and used for any time travel purpose, then it seems to me 
that it could be safely pruned through some utility actions. If schema changes 
frequently and there are many columns it might be helpful in reducing metadata 
size.
> 
> +1 for reopening the issue to discuss further. We should probably also make 
the title more specific than "The metadata file is too large".
> 
> Best,
> Jack Ye
> 
> On Tue, Feb 20, 2024 at 9:55 AM Sung Yun (BLOOMBERG/ 120 PARK) 
<syu...@bloomberg.net <mailto:syu...@bloomberg.net>> wrote:
>> Hi Barron, we've noticed the same issue as well since this PR was merged in 
to introduce schema versions: https://github.com/apache/iceberg/pull/2096
>> 
>> There's a closed issue where folks were discussions options in remediating 
this problem, that also has links to other related PRs and Issues: 
https://github.com/apache/iceberg/issues/5219. Should we reopen this issue and 
converge our discussion points on the main discussion thread?
>> 
>> Sung
>> 
>> From: barron....@twosigma.com <mailto:barron....@twosigma.com> At: 02/20/24 
12:34:57 UTC-5:00
>> To:  dev@iceberg.apache.org <mailto:dev@iceberg.apache.org>
>> Subject: Table Schema History Pruning
>> 
>> Hi folks,
>> 
>> I have a few questions regarding the schema history of an Iceberg table.
>> 
>> The table metadata file keeps track of every table schema version (at least 
in 
>> v2). Depending on the size of the schema, this history can become large in 
>> terms of byte size.
>> 
>> 
>>   1.  Is removing a schema from the history correct when the table does not 
>> have a snapshot referencing that schema version?
>>   2.  Does the Iceberg spec guarantee that the history is not pruned?
>>   3.  Are there any plans for Iceberg to support pruning the schema history?
>> 
>> Thanks,
>> Barron Wei
>> 


Reply via email to