Hey Iceberg Community,

*Background:*
Impala is designed in a way to cache the Iceberg table metadata (BaseTable
objects in practice) for faster access. Currently, Impala is tightly
coupled with HMS and in turn with the HiveCatalog, and in order to keep the
cached table objects up-to-date there is a notification mechanism driven by
HMS to notify Impala about any changes in the table metadata.
The Impala community is actively looking for ways to decouple HMS from
Impala and provide a way to use Impala without the need for HMS, and get
the Iceberg table metadata from other catalog Implementations mainly
focusing now on REST catalogs.

*Problem to solve:*
We identified a particular missing functionality in the current REST spec:
For engines that cache table metadata currently there is no way to check if
that table metadata is up-to-date or not, and whether the engine should
reload the metadata for that table or not without getting a whole table
object from the catalog. For this I think the REST catalog (but in fact I
think this could apply to any other catalogs) should be able to answer a
question like:
"Hi Catalog, I have this version of this table, is it up-to-date?"

*Proposal:*
I've been following the discussion about partial metadata loading
<https://lists.apache.org/thread/ll3q30410gfrr89lynojj7b2kyh1xgh9> that
could be also used to answer the above question, but I have the impression
now that the conversation stopped making any progress.
So instead of waiting for partial metadata loading I propose to have an
addition to the REST spec now to answer the question I raised above:

a) boolean isLatest(TableIdentifier ident, String metadataLocation);
b) String metadataLocation(TableIdentifier ident);

Any of the above 2 approaches could help engines to decide if they have to
invalidate/reload particular table metadata in the cache. I personally
would go for option a) but would be open to hear other opinions.

I'd like to know if the community could support me extending the REST spec
with any of the 2 options.

Regards,
Gabor

Reply via email to