Re: [Discuss] Simplify tableExists API in HiveCatalog

2024-12-02 Thread Steve Zhang
Thank you all for your valuable input on this topic. I apologize for not being able to keep up with the discussion last week, as I was traveling during the U.S. Thanksgiving holiday. To summarize, I think we have the agreement on following points: - tableExists API shall indicate whether a table

Re: [Discuss] Simplify tableExists API in HiveCatalog

2024-11-27 Thread Péter Váry
+1 from my side too. I wanted to make sure that the community is aware of this change which will bring behavioral difference compared to other catalogs. This is why I have asked Steve to start this thread. On Thu, Nov 28, 2024, 02:10 Szehon Ho wrote: > Yea, I think that part is definitely kept

Re: [Discuss] Simplify tableExists API in HiveCatalog

2024-11-27 Thread Szehon Ho
Yea, I think that part is definitely kept. Thanks Szehon On Wed, Nov 27, 2024 at 12:02 PM rdb...@gmail.com wrote: > I'd support changing the behavior if we still have a way to match the > intent, which is to return true if the table exists in Hive and is an > Iceberg table. > > On Wed, Nov 27,

Re: [Discuss] Simplify tableExists API in HiveCatalog

2024-11-27 Thread rdb...@gmail.com
I'd support changing the behavior if we still have a way to match the intent, which is to return true if the table exists in Hive and is an Iceberg table. On Wed, Nov 27, 2024 at 11:26 AM Szehon Ho wrote: > Hm I think the thread got a bit sidetracked by the other question. > > The initial propos

Re: [Discuss] Simplify tableExists API in HiveCatalog

2024-11-27 Thread Szehon Ho
Hm I think the thread got a bit sidetracked by the other question. The initial proposal by Steve is a performance improvement for HiveCatalog's tableExists(). Currently it loads both Hive and Iceberg table metadata, and if successful returns true. The proposal is to load from Hive only, and retu

Re: [Discuss] Simplify tableExists API in HiveCatalog

2024-11-27 Thread rdb...@gmail.com
What kind of corruption are you referring to? I would expect corruption to result in an exception when loading the table, but that the table should still exist. The problem is likely that we determine if a table exists by attempting to load it. We could fix that by not attempting to load the table.

Re: [Discuss] Simplify tableExists API in HiveCatalog

2024-11-27 Thread Manu Zhang
> > The current behavior's intent is not to check whether the metadata is > valid, it is to detect whether the table is an Iceberg table. Is there a way to detect this from HiveCatalog without loading the table? On Wed, Nov 27, 2024 at 2:01 PM Péter Váry wrote: > I think we have an agreement,

Re: [Discuss] Simplify tableExists API in HiveCatalog

2024-11-26 Thread Péter Váry
I think we have an agreement, not to change the behavior wrt existing non-Iceberg tables, and throw an exception. Are we also in agreement with the original proposal to return true when the table exists but the metadata is somehow corrupted? Note: this is the proposed change of behavior why the th

Re: [Discuss] Simplify tableExists API in HiveCatalog

2024-11-26 Thread rdb...@gmail.com
I'd argue against changing this. The current behavior's intent is not to check whether the metadata is valid, it is to detect whether the table is an Iceberg table. It ignores non-Iceberg tables. Changing that behavior would be surprising, especially if we started throwing exceptions. On Fri, Nov

Re: [Discuss] Simplify tableExists API in HiveCatalog

2024-11-22 Thread Kevin Liu
> Should add, my personal preference is probably not to change the existing behavior for this part +1. I realized that this is not a new behavior. The `loadTable` implementation has this problem too. It would be good to have a test case specifically for this edge case and maybe call this out in th

Re: [Discuss] Simplify tableExists API in HiveCatalog

2024-11-22 Thread Szehon Ho
Should add, my personal preference is probably not to change the existing behavior for this part (false, if exists a Hive table with same name) at the moment, just adding another possibility for consideration. Thanks Szehon On Fri, Nov 22, 2024 at 2:00 AM Szehon Ho wrote: > Thanks Kevin and Gab

Re: [Discuss] Simplify tableExists API in HiveCatalog

2024-11-22 Thread Szehon Ho
Thanks Kevin and Gabor, this is an interesting discussion. I guess a third option instead of returning true/false in this case, is to change it to throw an NoSuchIcebergTableException if its a non-Iceberg table, which I think is actually what this pr does? Thanks Szehon On Fri, Nov 22, 2024 at 1

Re: [Discuss] Simplify tableExists API in HiveCatalog

2024-11-22 Thread Gabor Kaszab
Hey, I think what Kevin says makes sense. However, it would then confuse the opposite use case of this function. Let's assume that we change the implementation of tableExists() to not load the table internally: if (tableExists(table_name)) { table = loadTable(table_name); } Here, you find th

Re: [Discuss] Simplify tableExists API in HiveCatalog

2024-11-21 Thread Kevin Liu
Hi Steve, This makes sense to me. The semantics of `tableExists` focus on whether a table's name exists in the catalog. For the Hive catalog, checking the HMS entry should be sufficient. I do have a question about usage, though. Typically, I would use ` tableExists` like this: ``` if (!tableExis

Re: [Discuss] Simplify tableExists API in HiveCatalog

2024-11-21 Thread Szehon Ho
Hi, It's a good performance find and improvement. Left some comment on the PR. IMO, the behavior actually more matches the API javadoc ("Check whether table exists"), not whether it is corrupted or not, so I'm supportive of it. Thanks Szehon On Thu, Nov 21, 2024 at 10:57 AM Steve Zhang wrote

[Discuss] Simplify tableExists API in HiveCatalog

2024-11-21 Thread Steve Zhang
Hi Iceberger, I have a proposal to simplify the tableExists API in the Hive catalog, which involves a behavior change, and I’d like to hear your thoughts. Currently, in our catalog interface[1], the tableExists method is implemented as a default API by invoking the loadTable method. It retu