Hi Gabor,

I think it's a bit related to the discussion about "partial metadata
retrieval" we have (as you said).
We don't yet have a consensus about this discussion and it's a pretty
large proposal.

I have a preference for isLatest() as it doesn't overlap with
filtering table metadata (that we can already do) in terms of
semantic.

Regards
JB

On Tue, Nov 12, 2024 at 1:12 PM Gabor Kaszab <gaborkas...@apache.org> wrote:
>
> Hey Iceberg Community,
>
> Background:
> Impala is designed in a way to cache the Iceberg table metadata (BaseTable 
> objects in practice) for faster access. Currently, Impala is tightly coupled 
> with HMS and in turn with the HiveCatalog, and in order to keep the cached 
> table objects up-to-date there is a notification mechanism driven by HMS to 
> notify Impala about any changes in the table metadata.
> The Impala community is actively looking for ways to decouple HMS from Impala 
> and provide a way to use Impala without the need for HMS, and get the Iceberg 
> table metadata from other catalog Implementations mainly focusing now on REST 
> catalogs.
>
> Problem to solve:
> We identified a particular missing functionality in the current REST spec: 
> For engines that cache table metadata currently there is no way to check if 
> that table metadata is up-to-date or not, and whether the engine should 
> reload the metadata for that table or not without getting a whole table 
> object from the catalog. For this I think the REST catalog (but in fact I 
> think this could apply to any other catalogs) should be able to answer a 
> question like:
> "Hi Catalog, I have this version of this table, is it up-to-date?"
>
> Proposal:
> I've been following the discussion about partial metadata loading that could 
> be also used to answer the above question, but I have the impression now that 
> the conversation stopped making any progress.
> So instead of waiting for partial metadata loading I propose to have an 
> addition to the REST spec now to answer the question I raised above:
>
> a) boolean isLatest(TableIdentifier ident, String metadataLocation);
> b) String metadataLocation(TableIdentifier ident);
>
> Any of the above 2 approaches could help engines to decide if they have to 
> invalidate/reload particular table metadata in the cache. I personally would 
> go for option a) but would be open to hear other opinions.
>
> I'd like to know if the community could support me extending the REST spec 
> with any of the 2 options.
>
> Regards,
> Gabor

Reply via email to