Hi Steve, This makes sense to me. The semantics of `tableExists` focus on whether a table's name exists in the catalog. For the Hive catalog, checking the HMS entry should be sufficient.
I do have a question about usage, though. Typically, I would use ` tableExists` like this: ``` if (!tableExists(table_name)) { table = createTable(table_name); } ``` What happens when a Hive table with the same name already exists in the catalog? In the current implementation, `tableExists` would return `false` because `HiveOperationsBase.validateTableIsIceberg` throws a `NoSuchTableException`. This would cause the code above to attempt to create the table, only to fail since the name already exists in the HMS. If `tableExists` is meant to check for conflicting entries in the HMS, perhaps it should return true even when a Hive table with the same name exists. I’d love to hear your thoughts on this. Best, Kevin Liu On Thu, Nov 21, 2024 at 5:22 PM Szehon Ho <szehon.apa...@gmail.com> wrote: > Hi, > > It's a good performance find and improvement. Left some comment on the > PR. > > IMO, the behavior actually more matches the API javadoc ("Check whether > table exists"), not whether it is corrupted or not, so I'm supportive of it. > > Thanks > Szehon > > On Thu, Nov 21, 2024 at 10:57 AM Steve Zhang > <hongyue_zh...@apple.com.invalid> wrote: > >> Hi Iceberger, >> >> I have a proposal to simplify the tableExists API in the Hive catalog, >> which involves a behavior change, and I’d like to hear your thoughts. >> >> Currently, in our catalog interface[1], the tableExists method is >> implemented as a default API by invoking the loadTable method. It >> returns true if the table can be loaded without exceptions. This >> behavior implies two checks: >> >> 1. The table entry exists in the catalog. >> 2. The latest metadata.json for the table is not corrupted. >> >> The behavior change I’m proposing focuses only on the first >> condition—checking if the table entry exists in the catalog. This separates >> the concerns of table existence and table health (e.g., metadata not >> corrupted). Such a change could improve the performance of existence >> checks, especially for RESTcatalog where table existence is abstracted as >> an HTTP HEAD request [2]. >> >> I also reviewed the current usage of the tableExists API in the Iceberg >> codebase to ensure that this optimization would not have any negative >> impact. >> >> I’d love to hear everyone’s feedback on this! If there’s consensus, I can >> follow up with a similar optimization for the viewExists method in the >> Hive catalog. >> >> [1]: https://github.com/apache/iceberg/pull/11597 >> >> [2]: >> https://github.com/apache/iceberg/blob/3badfe0c1fcf0c0adfc7aa4a10f0b50365c48cf9/open-api/rest-catalog-open-api.yaml#L1129-L1133 >> >> >> >> Best regards, >> Steve Zhang >> >> >> >>