Hi Gabor, I think it's a bit related to the discussion about "partial metadata retrieval" we have (as you said). We don't yet have a consensus about this discussion and it's a pretty large proposal.
I have a preference for isLatest() as it doesn't overlap with filtering table metadata (that we can already do) in terms of semantic. Regards JB On Tue, Nov 12, 2024 at 1:12 PM Gabor Kaszab <gaborkas...@apache.org> wrote: > > Hey Iceberg Community, > > Background: > Impala is designed in a way to cache the Iceberg table metadata (BaseTable > objects in practice) for faster access. Currently, Impala is tightly coupled > with HMS and in turn with the HiveCatalog, and in order to keep the cached > table objects up-to-date there is a notification mechanism driven by HMS to > notify Impala about any changes in the table metadata. > The Impala community is actively looking for ways to decouple HMS from Impala > and provide a way to use Impala without the need for HMS, and get the Iceberg > table metadata from other catalog Implementations mainly focusing now on REST > catalogs. > > Problem to solve: > We identified a particular missing functionality in the current REST spec: > For engines that cache table metadata currently there is no way to check if > that table metadata is up-to-date or not, and whether the engine should > reload the metadata for that table or not without getting a whole table > object from the catalog. For this I think the REST catalog (but in fact I > think this could apply to any other catalogs) should be able to answer a > question like: > "Hi Catalog, I have this version of this table, is it up-to-date?" > > Proposal: > I've been following the discussion about partial metadata loading that could > be also used to answer the above question, but I have the impression now that > the conversation stopped making any progress. > So instead of waiting for partial metadata loading I propose to have an > addition to the REST spec now to answer the question I raised above: > > a) boolean isLatest(TableIdentifier ident, String metadataLocation); > b) String metadataLocation(TableIdentifier ident); > > Any of the above 2 approaches could help engines to decide if they have to > invalidate/reload particular table metadata in the cache. I personally would > go for option a) but would be open to hear other opinions. > > I'd like to know if the community could support me extending the REST spec > with any of the 2 options. > > Regards, > Gabor