Hi,

It's a good performance find and improvement.   Left some comment on the PR.

IMO, the behavior actually more matches the API javadoc ("Check whether
table exists"), not whether it is corrupted or not, so I'm supportive of it.

Thanks
Szehon

On Thu, Nov 21, 2024 at 10:57 AM Steve Zhang
<hongyue_zh...@apple.com.invalid> wrote:

> Hi Iceberger,
>
>   I have a proposal to simplify the tableExists API in the Hive catalog,
> which involves a behavior change, and I’d like to hear your thoughts.
>
>   Currently, in our catalog interface[1], the tableExists method is
> implemented as a default API by invoking the loadTable method. It returns
> true if the table can be loaded without exceptions. This behavior implies
> two checks:
>
>    1. The table entry exists in the catalog.
>    2. The latest metadata.json for the table is not corrupted.
>
>   The behavior change I’m proposing focuses only on the first
> condition—checking if the table entry exists in the catalog. This separates
> the concerns of table existence and table health (e.g., metadata not
> corrupted). Such a change could improve the performance of existence
> checks, especially for RESTcatalog where table existence is abstracted as
> an HTTP HEAD request [2].
>
> I also reviewed the current usage of the tableExists API in the Iceberg
> codebase to ensure that this optimization would not have any negative
> impact.
>
> I’d love to hear everyone’s feedback on this! If there’s consensus, I can
> follow up with a similar optimization for the viewExists method in the
> Hive catalog.
>
> [1]: https://github.com/apache/iceberg/pull/11597
>
> [2]:
> https://github.com/apache/iceberg/blob/3badfe0c1fcf0c0adfc7aa4a10f0b50365c48cf9/open-api/rest-catalog-open-api.yaml#L1129-L1133
>
>
>
> Best regards,
> Steve Zhang
>
>
>
>

Reply via email to