Hi Iceberger,

  I have a proposal to simplify the tableExists API in the Hive catalog, which 
involves a behavior change, and I’d like to hear your thoughts.

  Currently, in our catalog interface[1], the tableExists method is implemented 
as a default API by invoking the loadTable method. It returns true if the table 
can be loaded without exceptions. This behavior implies two checks:

The table entry exists in the catalog.
The latest metadata.json for the table is not corrupted.
  The behavior change I’m proposing focuses only on the first 
condition—checking if the table entry exists in the catalog. This separates the 
concerns of table existence and table health (e.g., metadata not corrupted). 
Such a change could improve the performance of existence checks, especially for 
RESTcatalog where table existence is abstracted as an HTTP HEAD request [2].

I also reviewed the current usage of the tableExists API in the Iceberg 
codebase to ensure that this optimization would not have any negative impact.

I’d love to hear everyone’s feedback on this! If there’s consensus, I can 
follow up with a similar optimization for the viewExists method in the Hive 
catalog.

[1]: https://github.com/apache/iceberg/pull/11597
[2]: 
https://github.com/apache/iceberg/blob/3badfe0c1fcf0c0adfc7aa4a10f0b50365c48cf9/open-api/rest-catalog-open-api.yaml#L1129-L1133


Best regards,
Steve Zhang



Reply via email to