Hey Haizhou,

thanks for working on that proposal. I think my main concern with the
current proposal is that it adds quite a lot of complexity at a bunch of
places, since you'd need to partially update *TableMetadata*. Additionally,
it requires a new endpoint.

An alternative to that would be to do something similar to what we already
have in *TableMetadata*, where we lazily load *snapshots* when needed. We
could expand that approach to lazily load the full *TableMetadata* from the
server when necessary and always only show a slim version of *TableMetadata*.
I did such a POC a while ago, which can be seen in
That POC would need to be expanded so that it doesn't only do this for
snapshots, but also for other fields.
I believe the main fields that can get quite large over time are *snapshots
/ metadata-log / snapshot-log / schemas*.

Might be worth checking how much we could gain by using a lazy table
metadata supplier in this scenario, as that would reduce the required


On Thu, Oct 10, 2024 at 2:05 AM Haizhou Zhao <zhaohaizhou940...@gmail.com>

> Hello Dev List,
> I want to bring this proposal to discussion:
> https://docs.google.com/document/d/1eXnT0ZiFvdm_Zvk6fLGT_UxVWO-HsiqVywqu1Uk8s7E/edit#heading=h.uad1lm906wz4
> It proposes a new LoadTable API (branded LoadTableV2 at the moment) on
> REST spec that allows partially loading table metadata. The motivation is
> to stabilize and optimize Spark write workloads, especially on Iceberg
> tables with big metadata (e.g. due to huge list of snapshot/metadata log,
> complicated schema, etc.). We want to leverage this proposal to reduce
> operational and monetary cost of Iceberg & REST catalog usages, and achieve
> higher commit frequencies (DDL & DML included) on top of Iceberg tables
> through REST catalog.
> Looking forward to hearing feedback and discussions.
> Thank you,
> Haizhou

Reply via email to