Re: [Proposal] REST Spec: Server-side Metadata Tables

2024-07-06 Thread Walaa Eldin Moustafa
Hi Yufie, The original proposal did not seem to indicate that the metadata tables will be "materialized" (outside regular Iceberg metadata since most of those metadata tables are actually "views" on Iceberg metadata). However, in the last response, it seems metadata could potentially be written to

Re: [Proposal] REST Spec: Server-side Metadata Tables

2024-07-05 Thread Yufei Gu
aloud. > There are some usecases in authorization/authentication realm which could > use this approach. > > From: sn...@snazy.de At: 07/04/24 06:10:53 UTC-4:00 > To: dev@iceberg.apache.org > Subject: Re: [Proposal] REST Spec: Server-side Metadata Tables > > Hi Yufei, > > I

Re: [Proposal] REST Spec: Server-side Metadata Tables

2024-07-05 Thread Vishal Jadhav (BLOOMBERG/ NEW JERSE)
Thinking aloud.There are some usecases in authorization/authentication realm which could use this approach. From: sn...@snazy.de At: 07/04/24 06:10:53 UTC-4:00To: dev@iceberg.apache.org Subject: Re: [Proposal] REST Spec: Server-side Metadata Tables Hi Yufei, I think

Re: [Proposal] REST Spec: Server-side Metadata Tables

2024-07-04 Thread Drew
Hey Yufei, Thanks for the proposal!! On the topic of the big tables an idea that came to mind while working on the fine-grained metadata commit API was to optionally extend the catalog to support compression algorithms over REST, such as Brotli, and gzip. This would mean the catalog server could

Re: [Proposal] REST Spec: Server-side Metadata Tables

2024-07-04 Thread Manish Malhotra
Thanks Yufei! This is interesting, and what in my mind as well, as this is the natural progression of the REST Catalog. Totally agreed on enabling metadata from any platform/language, which is right now typically is from the query engines. Though, I feel, users need sql engines to analyze and deb

Re: [Proposal] REST Spec: Server-side Metadata Tables

2024-07-04 Thread Pucheng Yang
Hi all, regarding the "big metadata" issue, my understanding is even for Plan/Preplan API in the task planning use case, it will still have the same issue when the engine is doing a full table scan for large tables. Is my understanding correct? Also, given metadata compute could be heavy, do we co

Re: [Proposal] REST Spec: Server-side Metadata Tables

2024-07-04 Thread Amogh Jahagirdar
Thanks Yufei! I think it's worth thinking through if it makes sense to leverage Plan/Preplan APIs like Jack alluded to. I think this makes sense from a scale argument, since in the worst case the Plan/Preplan APIs need to be able to churn through all the metadata anyways. However, with this approa

Re: [Proposal] REST Spec: Server-side Metadata Tables

2024-07-04 Thread Robert Stupp
Hi Yufei, I think the proposal is very interesting! The direction this and other proposals are going is IMO the right one. Since many proposals need access to at least manifest-lists and manifest files, potentially also data/delete files, does it make sense to bundle all proposals that need

Re: [Proposal] REST Spec: Server-side Metadata Tables

2024-07-04 Thread Gabor Kaszab
Hey, This is a pretty interesting proposal, thanks for raising it, Yufei! About the 'big metadata' topic: I'm trying to understand the scale of the data being returned in such a scenario. Let me know if my calculations are wrong: taking the 'files' metadata table on a table with 100 cols, for me i

Re: [Proposal] REST Spec: Server-side Metadata Tables

2024-07-03 Thread Szehon Ho
Hi Piotr Thanks for the reply. It’s a good point, I was thinking it would be convenient in REST, and could avoid the hassle of spec change. But you are right that it probably belongs at a lower level if we support this feature generally (like an additional boolean on snapshot). Sorry to hija

Re: [Proposal] REST Spec: Server-side Metadata Tables

2024-07-03 Thread Ajantha Bhat
Thanks for the proposal. I would recommend creating the github issue with the "proposal" label for easy tracking of all the ongoing proposals as mentioned here . The first concern that comes to mind is whethe

Re: [Proposal] REST Spec: Server-side Metadata Tables

2024-07-03 Thread Piotr Findeisen
Hi Szehon, re listing 'removed' snapshots If I understand what you're saying is the following: Iceberg table format requires users to first delete metadata information about files and only then delete the files, and sometimes users want to order these events differently. We can solve this within

Re: [Proposal] REST Spec: Server-side Metadata Tables

2024-07-03 Thread Szehon Ho
Yes, I was chatting with Yufei about this, in the first glance I agree this would be nice to have. I always thought that metadata tables are important enough to spec somewhere, and I think this is a nice place to do it. There seems to be some overlap with existing calls (ie, you can get snapshots

Re: [Proposal] REST Spec: Server-side Metadata Tables

2024-07-03 Thread Jack Ye
Hi Yufei, Interesting that we are thinking about similar things. I had this item as a part of the roadmap discussion items in the catalog sync meeting, and then I removed it before the meeting because I felt it's too early to discuss. My main concern for having server-side metadata tables is how

[Proposal] REST Spec: Server-side Metadata Tables

2024-07-03 Thread Yufei Gu
Hi folks, I'd like to discuss a new proposal to support server-side metadata tables. One of Iceberg's most advantageous features is the ability to inspect a table using metadata tables. For instance, we can query snapshots just like we query data rows using the following command: SELECT * FROM pr