Sounds good to me, can you start a document then, and we can all contribute there?
On Fri, Mar 22, 2024 at 10:47 AM Walaa Eldin Moustafa <wa.moust...@gmail.com> wrote: > Let us list the pros and cons as originally planned. I can help as well if > needed. We can get started and have Jack chime in when he is back? > > On Fri, Mar 22, 2024 at 10:35 AM Szehon Ho <szehon.apa...@gmail.com> > wrote: > >> Hi >> >> My understanding was last time it was still unresolved, and the action >> item was on Jack and/or/ Jan to make a shorter document. I think the >> debate now has boiled down to Ryan's three options: >> >> 1. separate table/view >> 2. combination of table/view tied together via commit >> 3. new metadata type >> >> with probably the first and third being the main contenders. My >> understanding was we wanted a table of pros/cons between (1) and (3), >> presumably giving folks a chance to address the cons, before the next >> meeting. >> >> Jack (main proponent of option (3) just went on paternity leave, so not >> sure if there was someone from Amazon with some context of Jack's thought >> to continue that train of thought though? Otherwise maybe Jan can give it >> a shot? Else I will be out and can't make the next iceberg sync, but can >> prepare one for the one after that, if needed. >> >> Re: 'new' proposal', not sure if we are ready for a formal one, given the >> deadlock between the two options, but Im open to that as well to make a >> proposal based on one of the options above. What do folks think? >> >> Thanks, >> Szehon >> >> On Fri, Mar 22, 2024 at 3:15 AM Renjie Liu <liurenjie2...@gmail.com> >> wrote: >> >>> +1 >>> >>> On Fri, Mar 22, 2024 at 16:42 Jean-Baptiste Onofré <j...@nanthrax.net> >>> wrote: >>> >>>> Hi Renjie, >>>> >>>> We discussed the MV proposal, without yet reaching any conclusion. >>>> >>>> I propose: >>>> - to use the "new" proposal process in place (creating an GH issue with >>>> proposal flag, with link to the document) >>>> - use the document and/or GH issue to add comments >>>> - finalize the document heading to a vote (to get consensus) >>>> >>>> Thoughts ? >>>> >>>> NB: I will follow up with "stale PR/proposal" PR to be sure we are >>>> moving forward ;) >>>> >>>> Regards >>>> JB >>>> >>>> On Fri, Mar 22, 2024 at 4:29 AM Renjie Liu <liurenjie2...@gmail.com> >>>> wrote: >>>> >>>>> Hi: >>>>> >>>>> Sorry I didn't make it to join the last community sync. Did we reach >>>>> any conclusion about mv spec? >>>>> >>>>> On Tue, Mar 5, 2024 at 11:28 PM himadri pal <meh...@gmail.com> wrote: >>>>> >>>>>> For me the calendar link did not work in mobile, but I was able to >>>>>> add the dev Google calendar from >>>>>> https://iceberg.apache.org/community/#iceberg-community-events by >>>>>> accessing it from laptop. >>>>>> >>>>>> Regards, >>>>>> Himadri Pal >>>>>> >>>>>> >>>>>> On Mon, Mar 4, 2024 at 4:43 PM Walaa Eldin Moustafa < >>>>>> wa.moust...@gmail.com> wrote: >>>>>> >>>>>>> Thanks Jack! I think the images are stripped from the message, but >>>>>>> they are there on the doc >>>>>>> <https://docs.google.com/spreadsheets/d/1a0tlyh8f2ft2SepE7H3bgoY2A0q5IILgzAsJMnwjTBs/edit#gid=0> >>>>>>> if >>>>>>> someone wants to check them out (I have left some comments while there). >>>>>>> >>>>>>> Also I no longer see the community sync calendar >>>>>>> https://iceberg.apache.org/community/#slack, so it is unclear when >>>>>>> the meeting is (and we do not have the link). >>>>>>> >>>>>>> Thanks, >>>>>>> Walaa. >>>>>>> >>>>>>> >>>>>>> On Mon, Mar 4, 2024 at 9:58 AM Jack Ye <yezhao...@gmail.com> wrote: >>>>>>> >>>>>>>> Thanks Jan! +1 for everyone to take a look before the discussion, >>>>>>>> and see if there are any missing options or major arguments. >>>>>>>> >>>>>>>> I have also added the images regarding all the options, it might be >>>>>>>> easier to parse than the big sheet. I will also put it here for people >>>>>>>> that >>>>>>>> do not have time to read through it: >>>>>>>> >>>>>>>> >>>>>>>> *Option 1: Add storage table identifier in view metadata content* >>>>>>>> >>>>>>>> [image: MV option 1.png] >>>>>>>> *Option 2: Add storage table metadata file pointer in view object* >>>>>>>> >>>>>>>> [image: MV option 2.png] >>>>>>>> *Option 3: Add storage table metadata file pointer in view metadata >>>>>>>> content* >>>>>>>> >>>>>>>> [image: MV option 3.png] >>>>>>>> >>>>>>>> *Option 4: Embed table metadata in view metadata content* >>>>>>>> >>>>>>>> [image: MV option 4.png] >>>>>>>> *Option 5: New MV spec, MV object has table and view metadata file >>>>>>>> pointers* >>>>>>>> >>>>>>>> [image: MV option 5.png] >>>>>>>> *Option 6: New MV spec, MV metadata content embeds table and view >>>>>>>> metadata* >>>>>>>> >>>>>>>> [image: MV option 6.png] >>>>>>>> *Option 7: New MV spec, completely new MV metadata content* >>>>>>>> >>>>>>>> [image: MV option 7.png] >>>>>>>> >>>>>>>> -Jack >>>>>>>> >>>>>>>> >>>>>>>> On Sun, Mar 3, 2024 at 11:45 PM Jan Kaul >>>>>>>> <jank...@mailbox.org.invalid> wrote: >>>>>>>> >>>>>>>>> I think it's great to have a face to face discussion about this. >>>>>>>>> Additionally, I would propose to use Jacks' document >>>>>>>>> <https://docs.google.com/spreadsheets/d/1a0tlyh8f2ft2SepE7H3bgoY2A0q5IILgzAsJMnwjTBs/edit#gid=0> >>>>>>>>> as a common ground for the discussion and that everyone has a quick >>>>>>>>> look >>>>>>>>> before the next community sync. If you think the document is still >>>>>>>>> missing >>>>>>>>> some arguments, please make suggestions to add them. This way we have >>>>>>>>> to >>>>>>>>> spend less time to get everyone up to speed and have a more common >>>>>>>>> terminology. >>>>>>>>> >>>>>>>>> Looking forward to the discussion, best wishes >>>>>>>>> >>>>>>>>> Jan >>>>>>>>> On 02.03.24 02:06, Walaa Eldin Moustafa wrote: >>>>>>>>> >>>>>>>>> The calendar on the site is currently broken >>>>>>>>> https://iceberg.apache.org/community/#iceberg-community-events. >>>>>>>>> Might help to fix it or share the meeting link here. >>>>>>>>> >>>>>>>>> On Fri, Mar 1, 2024 at 3:43 PM Jack Ye <yezhao...@gmail.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Sounds good, let's discuss this in person! >>>>>>>>>> >>>>>>>>>> I am a bit worried that we have quite a few critical topics going >>>>>>>>>> on right now on devlist, and this will take up a lot of time to >>>>>>>>>> discuss. If >>>>>>>>>> it ends up going for too long, l propose let us have a dedicated >>>>>>>>>> meeting, >>>>>>>>>> and I am more than happy to organize it. >>>>>>>>>> >>>>>>>>>> Best, >>>>>>>>>> Jack Ye >>>>>>>>>> >>>>>>>>>> On Fri, Mar 1, 2024 at 12:48 PM Ryan Blue <b...@tabular.io> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Hey everyone, >>>>>>>>>>> >>>>>>>>>>> I think this thread has hit a point of diminishing returns and >>>>>>>>>>> that we still don't have a common understanding of what the options >>>>>>>>>>> under >>>>>>>>>>> consideration actually are. >>>>>>>>>>> >>>>>>>>>>> Since we were already planning on discussing this at the next >>>>>>>>>>> community sync, I suggest we pick this up there and use that time >>>>>>>>>>> to align >>>>>>>>>>> on what exactly we're considering. We can then start a new thread >>>>>>>>>>> to lay >>>>>>>>>>> out the designs under consideration in more detail and then have a >>>>>>>>>>> discussion about trade-offs. >>>>>>>>>>> >>>>>>>>>>> Does that sound reasonable? >>>>>>>>>>> >>>>>>>>>>> Ryan >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Fri, Mar 1, 2024 at 11:09 AM Walaa Eldin Moustafa < >>>>>>>>>>> wa.moust...@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> I am finding it hard to interpret the options concretely. I >>>>>>>>>>>> would also suggest breaking the expectation/outcome to milestones. >>>>>>>>>>>> Maybe it >>>>>>>>>>>> becomes easier if we agree to distinguish between an approach that >>>>>>>>>>>> is >>>>>>>>>>>> feasible in the near term and another in the long term, especially >>>>>>>>>>>> if the >>>>>>>>>>>> latter requires significant engine-side changes. >>>>>>>>>>>> >>>>>>>>>>>> Further, maybe it helps if we start with an option that fully >>>>>>>>>>>> reuses the existing spec, and see how we view it in comparison >>>>>>>>>>>> with the >>>>>>>>>>>> options discussed previously. I am sharing one below. It reuses >>>>>>>>>>>> the current >>>>>>>>>>>> spec of Iceberg views and tables by leveraging table properties to >>>>>>>>>>>> capture >>>>>>>>>>>> materialized view metadata. What is common (and not common) >>>>>>>>>>>> between this >>>>>>>>>>>> and the desired representations? >>>>>>>>>>>> >>>>>>>>>>>> The new properties are: >>>>>>>>>>>> Properties on a View: >>>>>>>>>>>> >>>>>>>>>>>> 1. >>>>>>>>>>>> >>>>>>>>>>>> *iceberg.materialized.view*: >>>>>>>>>>>> - *Type*: View property >>>>>>>>>>>> - *Purpose*: This property is used to mark whether a >>>>>>>>>>>> view is a materialized view. If set to true, the view is >>>>>>>>>>>> treated as a materialized view. This helps in >>>>>>>>>>>> differentiating between >>>>>>>>>>>> virtual and materialized views within the catalog and >>>>>>>>>>>> dictates specific >>>>>>>>>>>> handling and validation logic for materialized views. >>>>>>>>>>>> 2. >>>>>>>>>>>> >>>>>>>>>>>> *iceberg.materialized.view.storage.location*: >>>>>>>>>>>> - *Type*: View property >>>>>>>>>>>> - *Purpose*: Specifies the location of the storage table >>>>>>>>>>>> associated with the materialized view. This property is used >>>>>>>>>>>> for linking a >>>>>>>>>>>> materialized view with its corresponding storage table, >>>>>>>>>>>> enabling data >>>>>>>>>>>> management and query execution based on the stored data >>>>>>>>>>>> freshness. >>>>>>>>>>>> >>>>>>>>>>>> Properties on a Table: >>>>>>>>>>>> >>>>>>>>>>>> 1. *base.snapshot.[UUID]*: >>>>>>>>>>>> - *Type*: Table property >>>>>>>>>>>> - *Purpose*: These properties store the snapshot IDs of >>>>>>>>>>>> the base tables at the time the materialized view's data was >>>>>>>>>>>> last updated. >>>>>>>>>>>> Each property is prefixed with base.snapshot. followed >>>>>>>>>>>> by the UUID of the base table. They are used to track >>>>>>>>>>>> whether the >>>>>>>>>>>> materialized view's data is up to date with the base tables >>>>>>>>>>>> by comparing >>>>>>>>>>>> these snapshot IDs with the current snapshot IDs of the base >>>>>>>>>>>> tables. If all >>>>>>>>>>>> the base tables' current snapshot IDs match the ones stored >>>>>>>>>>>> in these >>>>>>>>>>>> properties, the materialized view's data is considered fresh. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Walaa. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Fri, Mar 1, 2024 at 9:15 AM Jack Ye <yezhao...@gmail.com> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> > All of these approaches are aligned in one, specific way: >>>>>>>>>>>>> the storage table is an iceberg table. >>>>>>>>>>>>> >>>>>>>>>>>>> I do not think that is true. I think people are aligned that >>>>>>>>>>>>> we would like to re-use the Iceberg table metadata defined in the >>>>>>>>>>>>> Iceberg >>>>>>>>>>>>> table spec to express the data in MV, but I don't think it goes >>>>>>>>>>>>> that far to >>>>>>>>>>>>> say it must be an Iceberg table. Once you have that mindset, then >>>>>>>>>>>>> of course >>>>>>>>>>>>> option 1 (separate table and view) is the only option. >>>>>>>>>>>>> >>>>>>>>>>>>> > I don't think that is necessary and it >>>>>>>>>>>>> significantly increases the complexity. >>>>>>>>>>>>> >>>>>>>>>>>>> And can you quantify what you mean by "significantly increases >>>>>>>>>>>>> the complexity"? Seems like a lot of concerns are coming from the >>>>>>>>>>>>> tradeoff >>>>>>>>>>>>> with complexity. We probably all agree that using option 7 (a >>>>>>>>>>>>> completely >>>>>>>>>>>>> new metadata type) is a lot of work from scratch, that is why it >>>>>>>>>>>>> is not >>>>>>>>>>>>> favored. However, my understanding is that as long as we re-use >>>>>>>>>>>>> the view >>>>>>>>>>>>> and table metadata, then the majority of the existing logic can >>>>>>>>>>>>> be reused. >>>>>>>>>>>>> I think what we have gone through in Slack to draft the rough >>>>>>>>>>>>> Java API >>>>>>>>>>>>> shape helps here, because people can estimate the amount of >>>>>>>>>>>>> effort required >>>>>>>>>>>>> to implement it. And I don't think they are **significantly** >>>>>>>>>>>>> more complex >>>>>>>>>>>>> to implement. Could you elaborate more about the complexity that >>>>>>>>>>>>> you >>>>>>>>>>>>> imagine? >>>>>>>>>>>>> >>>>>>>>>>>>> -Jack >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Fri, Mar 1, 2024 at 8:57 AM Daniel Weeks < >>>>>>>>>>>>> daniel.c.we...@gmail.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> I feel I've been most vocal about pushing back against >>>>>>>>>>>>>> options 2+ (or Ryan's categories of combined table/view, or new >>>>>>>>>>>>>> metadata >>>>>>>>>>>>>> type), so I'll try to expand on my reasoning. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I understand the appeal of creating a design where we >>>>>>>>>>>>>> encapsulate the view/storage from both a structural and >>>>>>>>>>>>>> performance >>>>>>>>>>>>>> standpoint, but I don't think that is necessary and it >>>>>>>>>>>>>> significantly increases the complexity. >>>>>>>>>>>>>> >>>>>>>>>>>>>> All of these approaches are aligned in one, specific way: the >>>>>>>>>>>>>> storage table is an iceberg table. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Because of this, all the behaviors and requirements >>>>>>>>>>>>>> still apply to these tables. They need to be maintained >>>>>>>>>>>>>> (snapshot cleanup, >>>>>>>>>>>>>> orphan files), in cases need to be optimized (compaction, >>>>>>>>>>>>>> manifest >>>>>>>>>>>>>> rewrites), they need to be able to be inspected (this will be >>>>>>>>>>>>>> even more >>>>>>>>>>>>>> important with MV since staleness can produce different results >>>>>>>>>>>>>> and >>>>>>>>>>>>>> questions will arise about what state the storage table was in). >>>>>>>>>>>>>> There may >>>>>>>>>>>>>> be cases where the tables need to be managed directly. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Anywhere we deviate from the existing >>>>>>>>>>>>>> constructs/commit/access for tables, we will ultimately have to >>>>>>>>>>>>>> then >>>>>>>>>>>>>> unwrap to re-expose the underlying Iceberg behavior. This >>>>>>>>>>>>>> creates >>>>>>>>>>>>>> unnecessary complexity in the library/API layer, which are not >>>>>>>>>>>>>> the primary >>>>>>>>>>>>>> interface users will have with materialized views where an >>>>>>>>>>>>>> engine is almost >>>>>>>>>>>>>> entirely necessary to interact with the dataset. >>>>>>>>>>>>>> >>>>>>>>>>>>>> As to the performance concerns around option 1, I think we're >>>>>>>>>>>>>> overstating the downsides. It really comes down to how many >>>>>>>>>>>>>> metadata loads >>>>>>>>>>>>>> are necessary and evaluating freshness would likely be the real >>>>>>>>>>>>>> bottleneck >>>>>>>>>>>>>> as it involves potentially loading many tables. All of the >>>>>>>>>>>>>> options are on >>>>>>>>>>>>>> the same order of performance for the metadata and table loads. >>>>>>>>>>>>>> >>>>>>>>>>>>>> As to the visibility of tables and whether they're registered >>>>>>>>>>>>>> in the catalog, I think registering in the catalog is the right >>>>>>>>>>>>>> approach so >>>>>>>>>>>>>> that the tables are still addressable for maintenance/etc. The >>>>>>>>>>>>>> visibility >>>>>>>>>>>>>> of the storage table is a catalog implementation decision and >>>>>>>>>>>>>> shouldn't be >>>>>>>>>>>>>> a requirement of the MV spec (I can see cases for both and it >>>>>>>>>>>>>> isn't >>>>>>>>>>>>>> necessary to dictate a behavior). >>>>>>>>>>>>>> >>>>>>>>>>>>>> I'm still strongly in favor of Option 1 (separate table and >>>>>>>>>>>>>> view) for these reasons. >>>>>>>>>>>>>> >>>>>>>>>>>>>> -Dan >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Thu, Feb 29, 2024 at 11:07 PM Jack Ye <yezhao...@gmail.com> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> > Jack, it sounds like you’re the proponent of a combined >>>>>>>>>>>>>>> table and view (rather than a new metadata spec for a >>>>>>>>>>>>>>> materialized view). >>>>>>>>>>>>>>> What is the main motivation? It seems like you’re convinced of >>>>>>>>>>>>>>> that >>>>>>>>>>>>>>> approach, but I don’t understand the advantage it brings. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Sorry I have to make a Google Sheet to capture all the >>>>>>>>>>>>>>> options we have discussed so far, I wanted to use the existing >>>>>>>>>>>>>>> Google Doc, >>>>>>>>>>>>>>> but it has really bad table/sheet support... >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> https://docs.google.com/spreadsheets/d/1a0tlyh8f2ft2SepE7H3bgoY2A0q5IILgzAsJMnwjTBs/edit#gid=0 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I have listed all the options, with how they are implemented >>>>>>>>>>>>>>> and some important considerations we have discussed so far. >>>>>>>>>>>>>>> Note that: >>>>>>>>>>>>>>> 1. This sheet currently excludes the lineage information, >>>>>>>>>>>>>>> which we can discuss more later after the current topic is >>>>>>>>>>>>>>> resolved. >>>>>>>>>>>>>>> 2. I removed the considerations for REST integration since >>>>>>>>>>>>>>> from the other thread we have clarified that they should be >>>>>>>>>>>>>>> considered >>>>>>>>>>>>>>> completely separately. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> *Why I come as a proponent of having a new MV object with >>>>>>>>>>>>>>> table and view metadata file pointer* >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> In my sheet, there are 3 options that do not have major >>>>>>>>>>>>>>> problems: >>>>>>>>>>>>>>> Option 2: Add storage table metadata file pointer in view >>>>>>>>>>>>>>> object >>>>>>>>>>>>>>> Option 5: New MV object with table and view metadata file >>>>>>>>>>>>>>> pointer >>>>>>>>>>>>>>> Option 6: New MV spec with table and view metadata >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I originally excluded option 2 because I think it does not >>>>>>>>>>>>>>> align with the REST spec, but after the other discussion thread >>>>>>>>>>>>>>> about "Inconsistency >>>>>>>>>>>>>>> between REST spec and table/view spec", I think my original >>>>>>>>>>>>>>> concern no >>>>>>>>>>>>>>> longer holds true so now I put it back. And based on my >>>>>>>>>>>>>>> personal preference that MV is an independent object that >>>>>>>>>>>>>>> should be >>>>>>>>>>>>>>> separated from view and table, plus the fact that option 5 is >>>>>>>>>>>>>>> probably less >>>>>>>>>>>>>>> work than option 6 for implementation, that is how I come as a >>>>>>>>>>>>>>> proponent of >>>>>>>>>>>>>>> option 5 at this moment. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> *Regarding Ryan's evaluation framework * >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I think we need to reconcile this sheet with Ryan's >>>>>>>>>>>>>>> evaluation framework. That framework categorization puts option >>>>>>>>>>>>>>> 2, 3, 4, 5, >>>>>>>>>>>>>>> 6 all under the same category of "A combination of a view >>>>>>>>>>>>>>> and a table" and concludes that they don't have any advantage >>>>>>>>>>>>>>> for the same >>>>>>>>>>>>>>> set of reasons. But those reasons are not really convincing to >>>>>>>>>>>>>>> me so let's >>>>>>>>>>>>>>> talk about them in more detail. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> (1) You said "I don’t see a reason why a combined view and >>>>>>>>>>>>>>> table is advantageous" as "this would cause unnecessary >>>>>>>>>>>>>>> dependence between >>>>>>>>>>>>>>> the view and table in catalogs." What dependency exactly do >>>>>>>>>>>>>>> you mean here? >>>>>>>>>>>>>>> And why is that unnecessary, given there has to be some sort of >>>>>>>>>>>>>>> dependency >>>>>>>>>>>>>>> anyway unless we go with option 5 or 6? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> (2) You said "I guess there’s an argument that you could >>>>>>>>>>>>>>> load both table and view metadata locations at the same time. >>>>>>>>>>>>>>> That hardly >>>>>>>>>>>>>>> seems worth the trouble". I disagree with that. Catalog >>>>>>>>>>>>>>> interaction >>>>>>>>>>>>>>> performance is critical to at least everyone working in EMR and >>>>>>>>>>>>>>> Athena, and >>>>>>>>>>>>>>> MV itself as an acceleration approach needs to be as fast as >>>>>>>>>>>>>>> possible. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I have put 3 key operations in the doc that I think matters >>>>>>>>>>>>>>> for MV during interactions with engine: >>>>>>>>>>>>>>> 1. refreshes storage table >>>>>>>>>>>>>>> 2. get the storage table of the MV >>>>>>>>>>>>>>> 3. if stale, get the view SQL >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> And option 1 clearly falls short with 4 sequential steps >>>>>>>>>>>>>>> required to load a storage table. You mentioned "recent issues >>>>>>>>>>>>>>> with adding >>>>>>>>>>>>>>> views to the JDBC catalog" in this topic, could you explain a >>>>>>>>>>>>>>> bit more? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> (3) You said "I also think that once we decide on structure, >>>>>>>>>>>>>>> we can make it possible for REST catalog implementations to do >>>>>>>>>>>>>>> smart >>>>>>>>>>>>>>> things, in a way that doesn’t put additional requirements on >>>>>>>>>>>>>>> the underlying >>>>>>>>>>>>>>> catalog store." If REST is fully compatible with Iceberg spec >>>>>>>>>>>>>>> then I have >>>>>>>>>>>>>>> no problem with this statement. However, as we discussed in the >>>>>>>>>>>>>>> other >>>>>>>>>>>>>>> thread, it is not the case. In the current state, I think the >>>>>>>>>>>>>>> sequence of >>>>>>>>>>>>>>> action should be to evolve the Iceberg table/view spec (or add >>>>>>>>>>>>>>> a MV spec) >>>>>>>>>>>>>>> first, and then think about how REST can incorporate it or do >>>>>>>>>>>>>>> smart things >>>>>>>>>>>>>>> that are not Iceberg spec compliant. Do you agree with that? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> (4) You said the table identifier pointer "is a problem we >>>>>>>>>>>>>>> need to solve generally because a materialized table needs to >>>>>>>>>>>>>>> be able to >>>>>>>>>>>>>>> track the upstream state of tables that were used". I don't >>>>>>>>>>>>>>> think that is a >>>>>>>>>>>>>>> reason to choose to use a table identifier pointer for a >>>>>>>>>>>>>>> storage table. The >>>>>>>>>>>>>>> issue is not about using a table identifier pointer. It is >>>>>>>>>>>>>>> about exposing >>>>>>>>>>>>>>> the storage table as a separate entity in the catalog, which is >>>>>>>>>>>>>>> what people >>>>>>>>>>>>>>> do not like and is already discussed in length in Jan's >>>>>>>>>>>>>>> question 3 (also >>>>>>>>>>>>>>> linked in the sheet). I agree with that statement, because >>>>>>>>>>>>>>> without a REST >>>>>>>>>>>>>>> implementation that can magically hide the storage table, this >>>>>>>>>>>>>>> model adds >>>>>>>>>>>>>>> additional burden regarding compliance and data governance for >>>>>>>>>>>>>>> any other >>>>>>>>>>>>>>> non-REST catalog implementations that are compliant to the >>>>>>>>>>>>>>> Iceberg spec. >>>>>>>>>>>>>>> Many mechanisms need to be built in a catalog to hide, protect, >>>>>>>>>>>>>>> maintain, >>>>>>>>>>>>>>> recycle the storage table, that can be avoided by using other >>>>>>>>>>>>>>> approaches. I >>>>>>>>>>>>>>> think we should reach a consensus about that and discuss >>>>>>>>>>>>>>> further if you do >>>>>>>>>>>>>>> not agree. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>> Jack Ye >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Thu, Feb 29, 2024 at 10:53 PM Jan Kaul >>>>>>>>>>>>>>> <jank...@mailbox.org.invalid> <jank...@mailbox.org.invalid> >>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hi Ryan, we actually discussed your categories in this >>>>>>>>>>>>>>>> question >>>>>>>>>>>>>>>> <https://docs.google.com/document/d/1UnhldHhe3Grz8JBngwXPA6ZZord1xMedY5ukEhZYF-A/edit?pli=1#heading=h.y70rtfhi9qxi>. >>>>>>>>>>>>>>>> Where your categories correspond to the following designs: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> - Separate table and view => Design 1 >>>>>>>>>>>>>>>> - Combination of view and table => Design 2 >>>>>>>>>>>>>>>> - A new metadata type => Design 4 >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Jan >>>>>>>>>>>>>>>> On 01.03.24 00:03, Ryan Blue wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Looks like it wasn’t clear what I meant for the 3 >>>>>>>>>>>>>>>> categories, so I’ll be more specific: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> - *Separate table and view*: this option is to have the >>>>>>>>>>>>>>>> objects that we have today, with extra metadata. Commit >>>>>>>>>>>>>>>> processes are >>>>>>>>>>>>>>>> separate: committing to the table doesn’t alter the view >>>>>>>>>>>>>>>> and committing to >>>>>>>>>>>>>>>> the view doesn’t change the table. However, changing the >>>>>>>>>>>>>>>> view can make it >>>>>>>>>>>>>>>> so the table is no longer useful as a materialization. >>>>>>>>>>>>>>>> - *A combination of a view and a table*: in this >>>>>>>>>>>>>>>> option, the table metadata and view metadata are the same >>>>>>>>>>>>>>>> as the first >>>>>>>>>>>>>>>> option. The difference is that the commit process combines >>>>>>>>>>>>>>>> them, either by >>>>>>>>>>>>>>>> embedding a table metadata location in view metadata or by >>>>>>>>>>>>>>>> tracking both in >>>>>>>>>>>>>>>> the same catalog reference. >>>>>>>>>>>>>>>> - *A new metadata type*: this option is where we define >>>>>>>>>>>>>>>> a new metadata object that has view attributes, like SQL >>>>>>>>>>>>>>>> representations, >>>>>>>>>>>>>>>> along with table attributes, like partition specs and >>>>>>>>>>>>>>>> snapshots. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hopefully this is clear because I think much of the >>>>>>>>>>>>>>>> confusion is caused by different definitions. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> The LoadTableResponse having optional metadata-location >>>>>>>>>>>>>>>> field implies that the object in the catalog no longer needs >>>>>>>>>>>>>>>> to hold a >>>>>>>>>>>>>>>> metadata file pointer >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> The REST protocol has not removed the requirement for a >>>>>>>>>>>>>>>> metadata file, so I’m going to keep focused on the MV design >>>>>>>>>>>>>>>> options. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> When we say a MV can be a “new metadata type”, it does not >>>>>>>>>>>>>>>> mean it needs to define a completely brand new structure of >>>>>>>>>>>>>>>> the metadata >>>>>>>>>>>>>>>> content >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I’m making a distinction between separate metadata files >>>>>>>>>>>>>>>> for the table and the view and a combined metadata object, as >>>>>>>>>>>>>>>> above. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> We can define an “Iceberg MV” to be an object in a catalog, >>>>>>>>>>>>>>>> which has 1 table metadata file pointer, and 1 view metadata >>>>>>>>>>>>>>>> file pointer >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> This is the option I am referring to as a “combination of a >>>>>>>>>>>>>>>> view and a table”. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> So to review my initial email, I don’t see a reason why a >>>>>>>>>>>>>>>> combined view and table is advantageous, either implemented by >>>>>>>>>>>>>>>> having a >>>>>>>>>>>>>>>> catalog reference with two metadata locations or embedding a >>>>>>>>>>>>>>>> table metadata >>>>>>>>>>>>>>>> location in view metadata. This would cause unnecessary >>>>>>>>>>>>>>>> dependence between >>>>>>>>>>>>>>>> the view and table in catalogs. I guess there’s an argument >>>>>>>>>>>>>>>> that you could >>>>>>>>>>>>>>>> load both table and view metadata locations at the same time. >>>>>>>>>>>>>>>> That hardly >>>>>>>>>>>>>>>> seems worth the trouble given the recent issues with adding >>>>>>>>>>>>>>>> views to the >>>>>>>>>>>>>>>> JDBC catalog. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I also think that once we decide on structure, we can make >>>>>>>>>>>>>>>> it possible for REST catalog implementations to do smart >>>>>>>>>>>>>>>> things, in a way >>>>>>>>>>>>>>>> that doesn’t put additional requirements on the underlying >>>>>>>>>>>>>>>> catalog store. >>>>>>>>>>>>>>>> For instance, we could specify how to send additional objects >>>>>>>>>>>>>>>> in a >>>>>>>>>>>>>>>> LoadViewResult, in case the catalog wants to pre-fetch table >>>>>>>>>>>>>>>> metadata. I >>>>>>>>>>>>>>>> think these optimizations are a later addition, after we >>>>>>>>>>>>>>>> define the >>>>>>>>>>>>>>>> relationship between views and tables. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Jack, it sounds like you’re the proponent of a combined >>>>>>>>>>>>>>>> table and view (rather than a new metadata spec for a >>>>>>>>>>>>>>>> materialized view). >>>>>>>>>>>>>>>> What is the main motivation? It seems like you’re convinced of >>>>>>>>>>>>>>>> that >>>>>>>>>>>>>>>> approach, but I don’t understand the advantage it brings. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Ryan >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Thu, Feb 29, 2024 at 12:26 PM Szehon Ho < >>>>>>>>>>>>>>>> szehon.apa...@gmail.com> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hi >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Yes I mostly agree with the assessment. To clarify a few >>>>>>>>>>>>>>>>> minor points. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> is a materialized view a view and a separate table, a >>>>>>>>>>>>>>>>>> combination of the two (i.e. commits are combined), or a new >>>>>>>>>>>>>>>>>> metadata type? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> For 'new metadata type', I consider mostly Jack's initial >>>>>>>>>>>>>>>>> proposal of a new Catalog MV object that has two references >>>>>>>>>>>>>>>>> (ViewMetadata + >>>>>>>>>>>>>>>>> TableMetadata). >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> The arguments that I see for a combined materialized view >>>>>>>>>>>>>>>>>> object are: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> - Regular views are separate, rather than being >>>>>>>>>>>>>>>>>> tables with SQL and no data so it would be inconsistent >>>>>>>>>>>>>>>>>> (“Iceberg view is >>>>>>>>>>>>>>>>>> just a table with no data but with representations >>>>>>>>>>>>>>>>>> defined. But we did not >>>>>>>>>>>>>>>>>> do that.”) >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> - Materialized views are different objects in DDL >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> - Tables may be a superset of functionality needed >>>>>>>>>>>>>>>>>> for materialized views >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> - Tables are not typically exposed to end users — but >>>>>>>>>>>>>>>>>> this isn’t required by the separate view and table option >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> For completeness, there seem to be a few additional ones >>>>>>>>>>>>>>>>> (mentioned in the Slack and above messages). >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> - Lack of spec change (to ViewMetadata). But as Jack >>>>>>>>>>>>>>>>> says it is a spec change (ie, to catalogs) >>>>>>>>>>>>>>>>> - A single call to get the View's StorageTable (versus >>>>>>>>>>>>>>>>> two calls) >>>>>>>>>>>>>>>>> - A more natural API, no opportunity for user to call >>>>>>>>>>>>>>>>> Catalog.dropTable() and renameTable() on storage table >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> *Thoughts: *I think the long discussion sessions we had >>>>>>>>>>>>>>>>> on Slack was fruitful for me, as seeing the API clarified >>>>>>>>>>>>>>>>> some things. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I was initially more in favor of MV being a new metadata >>>>>>>>>>>>>>>>> type (TableMetadata + ViewMetadata). But seeing most of the >>>>>>>>>>>>>>>>> MV operations >>>>>>>>>>>>>>>>> end up being ViewCatalog or Catalog operations, I am starting >>>>>>>>>>>>>>>>> to think >>>>>>>>>>>>>>>>> API-wise that it may not align with the new metadata type >>>>>>>>>>>>>>>>> (unless we define >>>>>>>>>>>>>>>>> MVCatalog and /MV REST endpoints, which then are boilerplate >>>>>>>>>>>>>>>>> wrappers). >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Initially one question I had for option 'a view and a >>>>>>>>>>>>>>>>> separate table', was how to make this table reference >>>>>>>>>>>>>>>>> (metadata.json or >>>>>>>>>>>>>>>>> catalog reference). In the previous option, we had a >>>>>>>>>>>>>>>>> precedent of Catalog >>>>>>>>>>>>>>>>> references to Metadata, but not pointers between Metadatas. >>>>>>>>>>>>>>>>> I initially >>>>>>>>>>>>>>>>> saw the proposed Catalog's TableIdentifier pointer as >>>>>>>>>>>>>>>>> 'polluting' catalog >>>>>>>>>>>>>>>>> concerns in ViewMetadata. (I saw Catalog and ViewCatalog as >>>>>>>>>>>>>>>>> a layer above >>>>>>>>>>>>>>>>> TableMetadata and ViewMetadata). But I think Dan in the >>>>>>>>>>>>>>>>> Slack made a fair >>>>>>>>>>>>>>>>> point that ViewMetadata already is tightly bound with a >>>>>>>>>>>>>>>>> Catalog. In this >>>>>>>>>>>>>>>>> case, I think this approach does have its merits as well in >>>>>>>>>>>>>>>>> aligning >>>>>>>>>>>>>>>>> Catalog API's with the metadata. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks >>>>>>>>>>>>>>>>> Szehon >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Thu, Feb 29, 2024 at 5:45 AM Jan Kaul >>>>>>>>>>>>>>>>> <jank...@mailbox.org.invalid> >>>>>>>>>>>>>>>>> <jank...@mailbox.org.invalid> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Hi all, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I would like to provide my perspective on the question of >>>>>>>>>>>>>>>>>> what a materialized view is and elaborate on Jack's recent >>>>>>>>>>>>>>>>>> proposal to view >>>>>>>>>>>>>>>>>> a materialized view as a catalog concept. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Firstly, let's look at the role of the catalog. Every >>>>>>>>>>>>>>>>>> entity in the catalog has a *unique identifier*, and the >>>>>>>>>>>>>>>>>> catalog provides methods to create, load, and update these >>>>>>>>>>>>>>>>>> entities. An >>>>>>>>>>>>>>>>>> important thing to note is that the catalog methods exhibit >>>>>>>>>>>>>>>>>> two different >>>>>>>>>>>>>>>>>> behaviors: the *create and load methods deal with the >>>>>>>>>>>>>>>>>> entire entity*, while the *update(commit) method only >>>>>>>>>>>>>>>>>> deals with partial changes* to the entities. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> In the context of our current discussion, materialized >>>>>>>>>>>>>>>>>> view (MV) metadata is a union of view and table metadata. >>>>>>>>>>>>>>>>>> The fact that the >>>>>>>>>>>>>>>>>> update method deals only with partial changes, enables us to >>>>>>>>>>>>>>>>>> *reuse >>>>>>>>>>>>>>>>>> the existing methods for updating tables and views*. For >>>>>>>>>>>>>>>>>> updates we don't have to define what constitutes an entire >>>>>>>>>>>>>>>>>> materialized >>>>>>>>>>>>>>>>>> view. Changes to a materialized view targeting the >>>>>>>>>>>>>>>>>> properties related to >>>>>>>>>>>>>>>>>> the view metadata could use the update(commit) view method. >>>>>>>>>>>>>>>>>> Similarly, >>>>>>>>>>>>>>>>>> changes targeting the properties related to the table >>>>>>>>>>>>>>>>>> metadata could use >>>>>>>>>>>>>>>>>> the update(commit) table method. This is great news because >>>>>>>>>>>>>>>>>> we don't have >>>>>>>>>>>>>>>>>> to redefine view and table commits (requirements, updates). >>>>>>>>>>>>>>>>>> This is shown in the fact that Jack uses the same >>>>>>>>>>>>>>>>>> operation to update the storage table for Option 1 and 3: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> // REST: POST >>>>>>>>>>>>>>>>>> /namespaces/db1/tables/mv1?materializedView=true >>>>>>>>>>>>>>>>>> // non-REST: update JSON files at table_metadata_location >>>>>>>>>>>>>>>>>> storageTable.newAppend().appendFile(...).commit(); >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> The open question is *whether the create and load >>>>>>>>>>>>>>>>>> methods should treat the properties that constitute the MV >>>>>>>>>>>>>>>>>> metadata as two >>>>>>>>>>>>>>>>>> entities (View + Table) or one entity (new MV object)*. >>>>>>>>>>>>>>>>>> This is all part of Jack's proposal, where Option 1 proposes >>>>>>>>>>>>>>>>>> a new MV >>>>>>>>>>>>>>>>>> object, and Option 3 proposes two separate entities. The >>>>>>>>>>>>>>>>>> advantage of >>>>>>>>>>>>>>>>>> Option 1 is that it doesn't require two operations to load >>>>>>>>>>>>>>>>>> the metadata. On >>>>>>>>>>>>>>>>>> the other hand, the advantage of Option 3 is that no new >>>>>>>>>>>>>>>>>> operations or >>>>>>>>>>>>>>>>>> catalogs have to be defined. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> In my opinion, defining a new representation for >>>>>>>>>>>>>>>>>> materialized views (Option 1) is generally the cleaner >>>>>>>>>>>>>>>>>> solution. However, I >>>>>>>>>>>>>>>>>> see a path where we could first introduce Option 3 and still >>>>>>>>>>>>>>>>>> have the >>>>>>>>>>>>>>>>>> possibility to transition to Option 1 if needed. The great >>>>>>>>>>>>>>>>>> thing about >>>>>>>>>>>>>>>>>> Option 3 is that it only requires minor changes to the >>>>>>>>>>>>>>>>>> current spec and is >>>>>>>>>>>>>>>>>> mostly implementation detail. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Therefore I would propose small additions to Jacks Option >>>>>>>>>>>>>>>>>> 3 that only introduce changes to the spec that are not >>>>>>>>>>>>>>>>>> specific to >>>>>>>>>>>>>>>>>> materialized views. The idea is to introduce boolean >>>>>>>>>>>>>>>>>> properties to be set >>>>>>>>>>>>>>>>>> on the creation of the view and the storage table that >>>>>>>>>>>>>>>>>> indicate that they >>>>>>>>>>>>>>>>>> belong to a materialized view. The view property >>>>>>>>>>>>>>>>>> "materialized" is set to >>>>>>>>>>>>>>>>>> "true" for a MV and "false" for a regular view. And the >>>>>>>>>>>>>>>>>> table property >>>>>>>>>>>>>>>>>> "storage_table" is set to "true" for a storage table and >>>>>>>>>>>>>>>>>> "false" for a >>>>>>>>>>>>>>>>>> regular table. The absence of these properties indicates a >>>>>>>>>>>>>>>>>> regular view or >>>>>>>>>>>>>>>>>> table. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> ViewCatalog viewCatalog = (ViewCatalog) catalog; >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> // REST: GET /namespaces/db1/views/mv1 >>>>>>>>>>>>>>>>>> // non-REST: load JSON file at metadata_location >>>>>>>>>>>>>>>>>> View mv = viewCatalog.loadView(TableIdentifier.of("db1", >>>>>>>>>>>>>>>>>> "mv1")); >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> // REST: GET /namespaces/db1/tables/mv1 >>>>>>>>>>>>>>>>>> // non-REST: load JSON file at table_metadata_location if >>>>>>>>>>>>>>>>>> present >>>>>>>>>>>>>>>>>> Table storageTable = view.storageTable(); >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> // REST: POST /namespaces/db1/tables/mv1 >>>>>>>>>>>>>>>>>> // non-REST: update JSON file at table_metadata_location >>>>>>>>>>>>>>>>>> storageTable.newAppend().appendFile(...).commit(); >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> We could then introduce a new requirement for views and >>>>>>>>>>>>>>>>>> tables called "AssertProperty" which could make sure to only >>>>>>>>>>>>>>>>>> perform >>>>>>>>>>>>>>>>>> updates that are inline with materialized views. The >>>>>>>>>>>>>>>>>> additional requirement >>>>>>>>>>>>>>>>>> can be seen as a general extension which does not need to be >>>>>>>>>>>>>>>>>> changed if we >>>>>>>>>>>>>>>>>> decide to got with Option 1 in the future. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Let me know what you think. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Best wishes, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Jan >>>>>>>>>>>>>>>>>> On 29.02.24 04:09, Walaa Eldin Moustafa wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Thanks Ryan for the insights. I agree that reusing >>>>>>>>>>>>>>>>>> existing metadata definitions and minimizing spec changes >>>>>>>>>>>>>>>>>> are very >>>>>>>>>>>>>>>>>> important. This also minimizes spec drift (between >>>>>>>>>>>>>>>>>> materialized views and >>>>>>>>>>>>>>>>>> views spec, and between materialized views and tables spec), >>>>>>>>>>>>>>>>>> and simplifies >>>>>>>>>>>>>>>>>> the implementation. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> In an effort to take the discussion forward with concrete >>>>>>>>>>>>>>>>>> design options based on an end-to-end implementation, I have >>>>>>>>>>>>>>>>>> prototyped the >>>>>>>>>>>>>>>>>> implementation (and added Spark support) in this PR >>>>>>>>>>>>>>>>>> https://github.com/apache/iceberg/pull/9830. I hope it >>>>>>>>>>>>>>>>>> helps us reach convergence faster. More details about some >>>>>>>>>>>>>>>>>> of the design >>>>>>>>>>>>>>>>>> options are discussed in the description of the PR. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>> Walaa. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Wed, Feb 28, 2024 at 6:20 PM Ryan Blue < >>>>>>>>>>>>>>>>>> b...@tabular.io> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I mean separate table and view metadata that is somehow >>>>>>>>>>>>>>>>>>> combined through a commit process. For instance, keeping a >>>>>>>>>>>>>>>>>>> pointer to a >>>>>>>>>>>>>>>>>>> table metadata file in a view metadata file or combining >>>>>>>>>>>>>>>>>>> commits to >>>>>>>>>>>>>>>>>>> reference both. I don't see the value in either option. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Wed, Feb 28, 2024 at 5:05 PM Jack Ye < >>>>>>>>>>>>>>>>>>> yezhao...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Thanks Ryan for the help to trace back to the root >>>>>>>>>>>>>>>>>>>> question! Just a clarification question regarding your >>>>>>>>>>>>>>>>>>>> reply before I reply >>>>>>>>>>>>>>>>>>>> further: what exactly does the option "a combination of >>>>>>>>>>>>>>>>>>>> the two (i.e. >>>>>>>>>>>>>>>>>>>> commits are combined)" mean? How is that different from "a >>>>>>>>>>>>>>>>>>>> new metadata >>>>>>>>>>>>>>>>>>>> type"? >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> -Jack >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>