Hey Haizhou, thanks for working on that proposal. I think my main concern with the current proposal is that it adds quite a lot of complexity at a bunch of places, since you'd need to partially update *TableMetadata*. Additionally, it requires a new endpoint.
An alternative to that would be to do something similar to what we already have in *TableMetadata*, where we lazily load *snapshots* when needed. We could expand that approach to lazily load the full *TableMetadata* from the server when necessary and always only show a slim version of *TableMetadata*. I did such a POC a while ago, which can be seen in https://github.com/nastra/iceberg/commit/ae2c7768c6f37be2f86b575bfc4fe84429b22a0e. That POC would need to be expanded so that it doesn't only do this for snapshots, but also for other fields. I believe the main fields that can get quite large over time are *snapshots / metadata-log / snapshot-log / schemas*. Might be worth checking how much we could gain by using a lazy table metadata supplier in this scenario, as that would reduce the required complexity. Thanks, Eduard On Thu, Oct 10, 2024 at 2:05 AM Haizhou Zhao <zhaohaizhou940...@gmail.com> wrote: > Hello Dev List, > > > I want to bring this proposal to discussion: > > > > https://docs.google.com/document/d/1eXnT0ZiFvdm_Zvk6fLGT_UxVWO-HsiqVywqu1Uk8s7E/edit#heading=h.uad1lm906wz4 > > > > It proposes a new LoadTable API (branded LoadTableV2 at the moment) on > REST spec that allows partially loading table metadata. The motivation is > to stabilize and optimize Spark write workloads, especially on Iceberg > tables with big metadata (e.g. due to huge list of snapshot/metadata log, > complicated schema, etc.). We want to leverage this proposal to reduce > operational and monetary cost of Iceberg & REST catalog usages, and achieve > higher commit frequencies (DDL & DML included) on top of Iceberg tables > through REST catalog. > > > > Looking forward to hearing feedback and discussions. > > > Thank you, > > Haizhou >