Let us list the pros and cons as originally planned. I can help as well if
needed. We can get started and have Jack chime in when he is back?

On Fri, Mar 22, 2024 at 10:35 AM Szehon Ho <szehon.apa...@gmail.com> wrote:

> Hi
>
> My understanding was last time it was still unresolved, and the action
> item was on Jack and/or/ Jan to make a shorter document.  I think the
> debate now has boiled down to Ryan's three options:
>
>    1. separate table/view
>    2. combination of table/view tied together via commit
>    3. new metadata type
>
>  with probably the first and third being the main contenders. My
> understanding was we wanted a table of pros/cons between (1) and (3),
> presumably giving folks a chance to address the cons, before the next
> meeting.
>
> Jack (main proponent of option (3) just went on paternity leave, so not
> sure if there was someone from Amazon with some context of Jack's thought
> to continue that train of thought though?  Otherwise maybe Jan can give it
> a shot?  Else I will be out and can't make the next iceberg sync, but can
> prepare one for the one after that, if needed.
>
> Re: 'new' proposal', not sure if we are ready for a formal one, given the
> deadlock between the two options, but Im open to that as well to make a
> proposal based on one of the options above.  What do folks think?
>
> Thanks,
> Szehon
>
> On Fri, Mar 22, 2024 at 3:15 AM Renjie Liu <liurenjie2...@gmail.com>
> wrote:
>
>> +1
>>
>> On Fri, Mar 22, 2024 at 16:42 Jean-Baptiste Onofré <j...@nanthrax.net>
>> wrote:
>>
>>> Hi Renjie,
>>>
>>> We discussed the MV proposal, without yet reaching any conclusion.
>>>
>>> I propose:
>>> - to use the "new" proposal process in place (creating an GH issue with
>>> proposal flag, with link to the document)
>>> - use the document and/or GH issue to add comments
>>> - finalize the document heading to a vote (to get consensus)
>>>
>>> Thoughts ?
>>>
>>> NB: I will follow up with "stale PR/proposal" PR to be sure we are
>>> moving forward ;)
>>>
>>> Regards
>>> JB
>>>
>>> On Fri, Mar 22, 2024 at 4:29 AM Renjie Liu <liurenjie2...@gmail.com>
>>> wrote:
>>>
>>>> Hi:
>>>>
>>>> Sorry I didn't make it to join the last community sync. Did we reach
>>>> any conclusion about mv spec?
>>>>
>>>> On Tue, Mar 5, 2024 at 11:28 PM himadri pal <meh...@gmail.com> wrote:
>>>>
>>>>> For me the calendar link did not work in mobile, but I was able to add
>>>>> the dev Google calendar from
>>>>> https://iceberg.apache.org/community/#iceberg-community-events by
>>>>> accessing it from  laptop.
>>>>>
>>>>> Regards,
>>>>> Himadri Pal
>>>>>
>>>>>
>>>>> On Mon, Mar 4, 2024 at 4:43 PM Walaa Eldin Moustafa <
>>>>> wa.moust...@gmail.com> wrote:
>>>>>
>>>>>> Thanks Jack! I think the images are stripped from the message, but
>>>>>> they are there on the doc
>>>>>> <https://docs.google.com/spreadsheets/d/1a0tlyh8f2ft2SepE7H3bgoY2A0q5IILgzAsJMnwjTBs/edit#gid=0>
>>>>>>  if
>>>>>> someone wants to check them out (I have left some comments while there).
>>>>>>
>>>>>> Also I no longer see the community sync calendar
>>>>>> https://iceberg.apache.org/community/#slack, so it is unclear when
>>>>>> the meeting is (and we do not have the link).
>>>>>>
>>>>>> Thanks,
>>>>>> Walaa.
>>>>>>
>>>>>>
>>>>>> On Mon, Mar 4, 2024 at 9:58 AM Jack Ye <yezhao...@gmail.com> wrote:
>>>>>>
>>>>>>> Thanks Jan! +1 for everyone to take a look before the discussion,
>>>>>>> and see if there are any missing options or major arguments.
>>>>>>>
>>>>>>> I have also added the images regarding all the options, it might be
>>>>>>> easier to parse than the big sheet. I will also put it here for people 
>>>>>>> that
>>>>>>> do not have time to read through it:
>>>>>>>
>>>>>>>
>>>>>>> *Option 1: Add storage table identifier in view metadata content*
>>>>>>>
>>>>>>> [image: MV option 1.png]
>>>>>>> *Option 2: Add storage table metadata file pointer in view object*
>>>>>>>
>>>>>>> [image: MV option 2.png]
>>>>>>> *Option 3: Add storage table metadata file pointer in view metadata
>>>>>>> content*
>>>>>>>
>>>>>>> [image: MV option 3.png]
>>>>>>>
>>>>>>> *Option 4: Embed table metadata in view metadata content*
>>>>>>>
>>>>>>> [image: MV option 4.png]
>>>>>>> *Option 5: New MV spec, MV object has table and view metadata file
>>>>>>> pointers*
>>>>>>>
>>>>>>> [image: MV option 5.png]
>>>>>>> *Option 6: New MV spec, MV metadata content embeds table and view
>>>>>>> metadata*
>>>>>>>
>>>>>>> [image: MV option 6.png]
>>>>>>> *Option 7: New MV spec, completely new MV metadata content*
>>>>>>>
>>>>>>> [image: MV option 7.png]
>>>>>>>
>>>>>>> -Jack
>>>>>>>
>>>>>>>
>>>>>>> On Sun, Mar 3, 2024 at 11:45 PM Jan Kaul <jank...@mailbox.org.invalid>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I think it's great to have a face to face discussion about this.
>>>>>>>> Additionally, I would propose to use Jacks' document
>>>>>>>> <https://docs.google.com/spreadsheets/d/1a0tlyh8f2ft2SepE7H3bgoY2A0q5IILgzAsJMnwjTBs/edit#gid=0>
>>>>>>>> as a common ground for the discussion and that everyone has a quick 
>>>>>>>> look
>>>>>>>> before the next community sync. If you think the document is still 
>>>>>>>> missing
>>>>>>>> some arguments, please make suggestions to add them. This way we have 
>>>>>>>> to
>>>>>>>> spend less time to get everyone up to speed and have a more common
>>>>>>>> terminology.
>>>>>>>>
>>>>>>>> Looking forward to the discussion, best wishes
>>>>>>>>
>>>>>>>> Jan
>>>>>>>> On 02.03.24 02:06, Walaa Eldin Moustafa wrote:
>>>>>>>>
>>>>>>>> The calendar on the site is currently broken
>>>>>>>> https://iceberg.apache.org/community/#iceberg-community-events.
>>>>>>>> Might help to fix it or share the meeting link here.
>>>>>>>>
>>>>>>>> On Fri, Mar 1, 2024 at 3:43 PM Jack Ye <yezhao...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Sounds good, let's discuss this in person!
>>>>>>>>>
>>>>>>>>> I am a bit worried that we have quite a few critical topics going
>>>>>>>>> on right now on devlist, and this will take up a lot of time to 
>>>>>>>>> discuss. If
>>>>>>>>> it ends up going for too long, l propose let us have a dedicated 
>>>>>>>>> meeting,
>>>>>>>>> and I am more than happy to organize it.
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>> Jack Ye
>>>>>>>>>
>>>>>>>>> On Fri, Mar 1, 2024 at 12:48 PM Ryan Blue <b...@tabular.io> wrote:
>>>>>>>>>
>>>>>>>>>> Hey everyone,
>>>>>>>>>>
>>>>>>>>>> I think this thread has hit a point of diminishing returns and
>>>>>>>>>> that we still don't have a common understanding of what the options 
>>>>>>>>>> under
>>>>>>>>>> consideration actually are.
>>>>>>>>>>
>>>>>>>>>> Since we were already planning on discussing this at the next
>>>>>>>>>> community sync, I suggest we pick this up there and use that time to 
>>>>>>>>>> align
>>>>>>>>>> on what exactly we're considering. We can then start a new thread to 
>>>>>>>>>> lay
>>>>>>>>>> out the designs under consideration in more detail and then have a
>>>>>>>>>> discussion about trade-offs.
>>>>>>>>>>
>>>>>>>>>> Does that sound reasonable?
>>>>>>>>>>
>>>>>>>>>> Ryan
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Fri, Mar 1, 2024 at 11:09 AM Walaa Eldin Moustafa <
>>>>>>>>>> wa.moust...@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> I am finding it hard to interpret the options concretely. I
>>>>>>>>>>> would also suggest breaking the expectation/outcome to milestones. 
>>>>>>>>>>> Maybe it
>>>>>>>>>>> becomes easier if we agree to distinguish between an approach that 
>>>>>>>>>>> is
>>>>>>>>>>> feasible in the near term and another in the long term, especially 
>>>>>>>>>>> if the
>>>>>>>>>>> latter requires significant engine-side changes.
>>>>>>>>>>>
>>>>>>>>>>> Further, maybe it helps if we start with an option that fully
>>>>>>>>>>> reuses the existing spec, and see how we view it in comparison with 
>>>>>>>>>>> the
>>>>>>>>>>> options discussed previously. I am sharing one below. It reuses the 
>>>>>>>>>>> current
>>>>>>>>>>> spec of Iceberg views and tables by leveraging table properties to 
>>>>>>>>>>> capture
>>>>>>>>>>> materialized view metadata. What is common (and not common) between 
>>>>>>>>>>> this
>>>>>>>>>>> and the desired representations?
>>>>>>>>>>>
>>>>>>>>>>> The new properties are:
>>>>>>>>>>> Properties on a View:
>>>>>>>>>>>
>>>>>>>>>>>    1.
>>>>>>>>>>>
>>>>>>>>>>>    *iceberg.materialized.view*:
>>>>>>>>>>>    - *Type*: View property
>>>>>>>>>>>       - *Purpose*: This property is used to mark whether a view
>>>>>>>>>>>       is a materialized view. If set to true, the view is
>>>>>>>>>>>       treated as a materialized view. This helps in differentiating 
>>>>>>>>>>> between
>>>>>>>>>>>       virtual and materialized views within the catalog and 
>>>>>>>>>>> dictates specific
>>>>>>>>>>>       handling and validation logic for materialized views.
>>>>>>>>>>>    2.
>>>>>>>>>>>
>>>>>>>>>>>    *iceberg.materialized.view.storage.location*:
>>>>>>>>>>>    - *Type*: View property
>>>>>>>>>>>       - *Purpose*: Specifies the location of the storage table
>>>>>>>>>>>       associated with the materialized view. This property is used 
>>>>>>>>>>> for linking a
>>>>>>>>>>>       materialized view with its corresponding storage table, 
>>>>>>>>>>> enabling data
>>>>>>>>>>>       management and query execution based on the stored data 
>>>>>>>>>>> freshness.
>>>>>>>>>>>
>>>>>>>>>>> Properties on a Table:
>>>>>>>>>>>
>>>>>>>>>>>    1. *base.snapshot.[UUID]*:
>>>>>>>>>>>       - *Type*: Table property
>>>>>>>>>>>       - *Purpose*: These properties store the snapshot IDs of
>>>>>>>>>>>       the base tables at the time the materialized view's data was 
>>>>>>>>>>> last updated.
>>>>>>>>>>>       Each property is prefixed with base.snapshot. followed by
>>>>>>>>>>>       the UUID of the base table. They are used to track whether 
>>>>>>>>>>> the materialized
>>>>>>>>>>>       view's data is up to date with the base tables by comparing 
>>>>>>>>>>> these snapshot
>>>>>>>>>>>       IDs with the current snapshot IDs of the base tables. If all 
>>>>>>>>>>> the base
>>>>>>>>>>>       tables' current snapshot IDs match the ones stored in these 
>>>>>>>>>>> properties, the
>>>>>>>>>>>       materialized view's data is considered fresh.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Walaa.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Mar 1, 2024 at 9:15 AM Jack Ye <yezhao...@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> > All of these approaches are aligned in one, specific way: the
>>>>>>>>>>>> storage table is an iceberg table.
>>>>>>>>>>>>
>>>>>>>>>>>> I do not think that is true. I think people are aligned that we
>>>>>>>>>>>> would like to re-use the Iceberg table metadata defined in the 
>>>>>>>>>>>> Iceberg
>>>>>>>>>>>> table spec to express the data in MV, but I don't think it goes 
>>>>>>>>>>>> that far to
>>>>>>>>>>>> say it must be an Iceberg table. Once you have that mindset, then 
>>>>>>>>>>>> of course
>>>>>>>>>>>> option 1 (separate table and view) is the only option.
>>>>>>>>>>>>
>>>>>>>>>>>> > I don't think that is necessary and it
>>>>>>>>>>>> significantly increases the complexity.
>>>>>>>>>>>>
>>>>>>>>>>>> And can you quantify what you mean by "significantly increases
>>>>>>>>>>>> the complexity"? Seems like a lot of concerns are coming from the 
>>>>>>>>>>>> tradeoff
>>>>>>>>>>>> with complexity. We probably all agree that using option 7 (a 
>>>>>>>>>>>> completely
>>>>>>>>>>>> new metadata type) is a lot of work from scratch, that is why it 
>>>>>>>>>>>> is not
>>>>>>>>>>>> favored. However, my understanding is that as long as we re-use 
>>>>>>>>>>>> the view
>>>>>>>>>>>> and table metadata, then the majority of the existing logic can be 
>>>>>>>>>>>> reused.
>>>>>>>>>>>> I think what we have gone through in Slack to draft the rough Java 
>>>>>>>>>>>> API
>>>>>>>>>>>> shape helps here, because people can estimate the amount of effort 
>>>>>>>>>>>> required
>>>>>>>>>>>> to implement it. And I don't think they are **significantly** more 
>>>>>>>>>>>> complex
>>>>>>>>>>>> to implement. Could you elaborate more about the complexity that 
>>>>>>>>>>>> you
>>>>>>>>>>>> imagine?
>>>>>>>>>>>>
>>>>>>>>>>>> -Jack
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Mar 1, 2024 at 8:57 AM Daniel Weeks <
>>>>>>>>>>>> daniel.c.we...@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> I feel I've been most vocal about pushing back against options
>>>>>>>>>>>>> 2+ (or Ryan's categories of combined table/view, or new metadata 
>>>>>>>>>>>>> type), so
>>>>>>>>>>>>> I'll try to expand on my reasoning.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I understand the appeal of creating a design where we
>>>>>>>>>>>>> encapsulate the view/storage from both a structural and 
>>>>>>>>>>>>> performance
>>>>>>>>>>>>> standpoint, but I don't think that is necessary and it
>>>>>>>>>>>>> significantly increases the complexity.
>>>>>>>>>>>>>
>>>>>>>>>>>>> All of these approaches are aligned in one, specific way: the
>>>>>>>>>>>>> storage table is an iceberg table.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Because of this, all the behaviors and requirements
>>>>>>>>>>>>> still apply to these tables.  They need to be maintained 
>>>>>>>>>>>>> (snapshot cleanup,
>>>>>>>>>>>>> orphan files), in cases need to be optimized (compaction, manifest
>>>>>>>>>>>>> rewrites), they need to be able to be inspected (this will be 
>>>>>>>>>>>>> even more
>>>>>>>>>>>>> important with MV since staleness can produce different results 
>>>>>>>>>>>>> and
>>>>>>>>>>>>> questions will arise about what state the storage table was in).  
>>>>>>>>>>>>> There may
>>>>>>>>>>>>> be cases where the tables need to be managed directly.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Anywhere we deviate from the existing constructs/commit/access
>>>>>>>>>>>>> for tables, we will ultimately have to then unwrap to re-expose 
>>>>>>>>>>>>> the
>>>>>>>>>>>>> underlying Iceberg behavior.  This creates unnecessary complexity 
>>>>>>>>>>>>> in the
>>>>>>>>>>>>> library/API layer, which are not the primary interface users will 
>>>>>>>>>>>>> have with
>>>>>>>>>>>>> materialized views where an engine is almost entirely necessary 
>>>>>>>>>>>>> to interact
>>>>>>>>>>>>> with the dataset.
>>>>>>>>>>>>>
>>>>>>>>>>>>> As to the performance concerns around option 1, I think we're
>>>>>>>>>>>>> overstating the downsides.  It really comes down to how many 
>>>>>>>>>>>>> metadata loads
>>>>>>>>>>>>> are necessary and evaluating freshness would likely be the real 
>>>>>>>>>>>>> bottleneck
>>>>>>>>>>>>> as it involves potentially loading many tables.  All of the 
>>>>>>>>>>>>> options are on
>>>>>>>>>>>>> the same order of performance for the metadata and table loads.
>>>>>>>>>>>>>
>>>>>>>>>>>>> As to the visibility of tables and whether they're registered
>>>>>>>>>>>>> in the catalog, I think registering in the catalog is the right 
>>>>>>>>>>>>> approach so
>>>>>>>>>>>>> that the tables are still addressable for maintenance/etc.  The 
>>>>>>>>>>>>> visibility
>>>>>>>>>>>>> of the storage table is a catalog implementation decision and 
>>>>>>>>>>>>> shouldn't be
>>>>>>>>>>>>> a requirement of the MV spec (I can see cases for both and it 
>>>>>>>>>>>>> isn't
>>>>>>>>>>>>> necessary to dictate a behavior).
>>>>>>>>>>>>>
>>>>>>>>>>>>> I'm still strongly in favor of Option 1 (separate table and
>>>>>>>>>>>>> view) for these reasons.
>>>>>>>>>>>>>
>>>>>>>>>>>>> -Dan
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Thu, Feb 29, 2024 at 11:07 PM Jack Ye <yezhao...@gmail.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> > Jack, it sounds like you’re the proponent of a combined
>>>>>>>>>>>>>> table and view (rather than a new metadata spec for a 
>>>>>>>>>>>>>> materialized view).
>>>>>>>>>>>>>> What is the main motivation? It seems like you’re convinced of 
>>>>>>>>>>>>>> that
>>>>>>>>>>>>>> approach, but I don’t understand the advantage it brings.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Sorry I have to make a Google Sheet to capture all the
>>>>>>>>>>>>>> options we have discussed so far, I wanted to use the existing 
>>>>>>>>>>>>>> Google Doc,
>>>>>>>>>>>>>> but it has really bad table/sheet support...
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> https://docs.google.com/spreadsheets/d/1a0tlyh8f2ft2SepE7H3bgoY2A0q5IILgzAsJMnwjTBs/edit#gid=0
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I have listed all the options, with how they are implemented
>>>>>>>>>>>>>> and some important considerations we have discussed so far. Note 
>>>>>>>>>>>>>> that:
>>>>>>>>>>>>>> 1. This sheet currently excludes the lineage information,
>>>>>>>>>>>>>> which we can discuss more later after the current topic is 
>>>>>>>>>>>>>> resolved.
>>>>>>>>>>>>>> 2. I removed the considerations for REST integration since
>>>>>>>>>>>>>> from the other thread we have clarified that they should be 
>>>>>>>>>>>>>> considered
>>>>>>>>>>>>>> completely separately.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *Why I come as a proponent of having a new MV object with
>>>>>>>>>>>>>> table and view metadata file pointer*
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> In my sheet, there are 3 options that do not have major
>>>>>>>>>>>>>> problems:
>>>>>>>>>>>>>> Option 2: Add storage table metadata file pointer in view
>>>>>>>>>>>>>> object
>>>>>>>>>>>>>> Option 5: New MV object with table and view metadata file
>>>>>>>>>>>>>> pointer
>>>>>>>>>>>>>> Option 6: New MV spec with table and view metadata
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I originally excluded option 2 because I think it does not
>>>>>>>>>>>>>> align with the REST spec, but after the other discussion thread 
>>>>>>>>>>>>>> about "Inconsistency
>>>>>>>>>>>>>> between REST spec and table/view spec", I think my original 
>>>>>>>>>>>>>> concern no
>>>>>>>>>>>>>> longer holds true so now I put it back. And based on my
>>>>>>>>>>>>>> personal preference that MV is an independent object that should 
>>>>>>>>>>>>>> be
>>>>>>>>>>>>>> separated from view and table, plus the fact that option 5 is 
>>>>>>>>>>>>>> probably less
>>>>>>>>>>>>>> work than option 6 for implementation, that is how I come as a 
>>>>>>>>>>>>>> proponent of
>>>>>>>>>>>>>> option 5 at this moment.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *Regarding Ryan's evaluation framework *
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I think we need to reconcile this sheet with Ryan's
>>>>>>>>>>>>>> evaluation framework. That framework categorization puts option 
>>>>>>>>>>>>>> 2, 3, 4, 5,
>>>>>>>>>>>>>> 6 all under the same category of "A combination of a view
>>>>>>>>>>>>>> and a table" and concludes that they don't have any advantage 
>>>>>>>>>>>>>> for the same
>>>>>>>>>>>>>> set of reasons. But those reasons are not really convincing to 
>>>>>>>>>>>>>> me so let's
>>>>>>>>>>>>>> talk about them in more detail.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> (1) You said "I don’t see a reason why a combined view and
>>>>>>>>>>>>>> table is advantageous" as "this would cause unnecessary 
>>>>>>>>>>>>>> dependence between
>>>>>>>>>>>>>> the view and table in catalogs."  What dependency exactly do you 
>>>>>>>>>>>>>> mean here?
>>>>>>>>>>>>>> And why is that unnecessary, given there has to be some sort of 
>>>>>>>>>>>>>> dependency
>>>>>>>>>>>>>> anyway unless we go with option 5 or 6?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> (2) You said "I guess there’s an argument that you could load
>>>>>>>>>>>>>> both table and view metadata locations at the same time. That 
>>>>>>>>>>>>>> hardly seems
>>>>>>>>>>>>>> worth the trouble". I disagree with that. Catalog interaction 
>>>>>>>>>>>>>> performance
>>>>>>>>>>>>>> is critical to at least everyone working in EMR and Athena, and 
>>>>>>>>>>>>>> MV itself
>>>>>>>>>>>>>> as an acceleration approach needs to be as fast as possible.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I have put 3 key operations in the doc that I think matters
>>>>>>>>>>>>>> for MV during interactions with engine:
>>>>>>>>>>>>>> 1. refreshes storage table
>>>>>>>>>>>>>> 2. get the storage table of the MV
>>>>>>>>>>>>>> 3. if stale, get the view SQL
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> And option 1 clearly falls short with 4 sequential steps
>>>>>>>>>>>>>> required to load a storage table. You mentioned "recent issues 
>>>>>>>>>>>>>> with adding
>>>>>>>>>>>>>> views to the JDBC catalog" in this topic, could you explain a 
>>>>>>>>>>>>>> bit more?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> (3) You said "I also think that once we decide on structure,
>>>>>>>>>>>>>> we can make it possible for REST catalog implementations to do 
>>>>>>>>>>>>>> smart
>>>>>>>>>>>>>> things, in a way that doesn’t put additional requirements on the 
>>>>>>>>>>>>>> underlying
>>>>>>>>>>>>>> catalog store." If REST is fully compatible with Iceberg spec 
>>>>>>>>>>>>>> then I have
>>>>>>>>>>>>>> no problem with this statement. However, as we discussed in the 
>>>>>>>>>>>>>> other
>>>>>>>>>>>>>> thread, it is not the case. In the current state, I think the 
>>>>>>>>>>>>>> sequence of
>>>>>>>>>>>>>> action should be to evolve the Iceberg table/view spec (or add a 
>>>>>>>>>>>>>> MV spec)
>>>>>>>>>>>>>> first, and then think about how REST can incorporate it or do 
>>>>>>>>>>>>>> smart things
>>>>>>>>>>>>>> that are not Iceberg spec compliant. Do you agree with that?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> (4) You said the table identifier pointer "is a problem we
>>>>>>>>>>>>>> need to solve generally because a materialized table needs to be 
>>>>>>>>>>>>>> able to
>>>>>>>>>>>>>> track the upstream state of tables that were used". I don't 
>>>>>>>>>>>>>> think that is a
>>>>>>>>>>>>>> reason to choose to use a table identifier pointer for a storage 
>>>>>>>>>>>>>> table. The
>>>>>>>>>>>>>> issue is not about using a table identifier pointer. It is about 
>>>>>>>>>>>>>> exposing
>>>>>>>>>>>>>> the storage table as a separate entity in the catalog, which is 
>>>>>>>>>>>>>> what people
>>>>>>>>>>>>>> do not like and is already discussed in length in Jan's question 
>>>>>>>>>>>>>> 3 (also
>>>>>>>>>>>>>> linked in the sheet). I agree with that statement, because 
>>>>>>>>>>>>>> without a REST
>>>>>>>>>>>>>> implementation that can magically hide the storage table, this 
>>>>>>>>>>>>>> model adds
>>>>>>>>>>>>>> additional burden regarding compliance and data governance for 
>>>>>>>>>>>>>> any other
>>>>>>>>>>>>>> non-REST catalog implementations that are compliant to the 
>>>>>>>>>>>>>> Iceberg spec.
>>>>>>>>>>>>>> Many mechanisms need to be built in a catalog to hide, protect, 
>>>>>>>>>>>>>> maintain,
>>>>>>>>>>>>>> recycle the storage table, that can be avoided by using other 
>>>>>>>>>>>>>> approaches. I
>>>>>>>>>>>>>> think we should reach a consensus about that and discuss further 
>>>>>>>>>>>>>> if you do
>>>>>>>>>>>>>> not agree.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>> Jack Ye
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Thu, Feb 29, 2024 at 10:53 PM Jan Kaul
>>>>>>>>>>>>>> <jank...@mailbox.org.invalid> <jank...@mailbox.org.invalid>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi Ryan, we actually discussed your categories in this
>>>>>>>>>>>>>>> question
>>>>>>>>>>>>>>> <https://docs.google.com/document/d/1UnhldHhe3Grz8JBngwXPA6ZZord1xMedY5ukEhZYF-A/edit?pli=1#heading=h.y70rtfhi9qxi>.
>>>>>>>>>>>>>>> Where your categories correspond to the following designs:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>    - Separate table and view => Design 1
>>>>>>>>>>>>>>>    - Combination of view and table => Design 2
>>>>>>>>>>>>>>>    - A new metadata type => Design 4
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Jan
>>>>>>>>>>>>>>> On 01.03.24 00:03, Ryan Blue wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Looks like it wasn’t clear what I meant for the 3
>>>>>>>>>>>>>>> categories, so I’ll be more specific:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>    - *Separate table and view*: this option is to have the
>>>>>>>>>>>>>>>    objects that we have today, with extra metadata. Commit 
>>>>>>>>>>>>>>> processes are
>>>>>>>>>>>>>>>    separate: committing to the table doesn’t alter the view and 
>>>>>>>>>>>>>>> committing to
>>>>>>>>>>>>>>>    the view doesn’t change the table. However, changing the 
>>>>>>>>>>>>>>> view can make it
>>>>>>>>>>>>>>>    so the table is no longer useful as a materialization.
>>>>>>>>>>>>>>>    - *A combination of a view and a table*: in this option,
>>>>>>>>>>>>>>>    the table metadata and view metadata are the same as the 
>>>>>>>>>>>>>>> first option. The
>>>>>>>>>>>>>>>    difference is that the commit process combines them, either 
>>>>>>>>>>>>>>> by embedding a
>>>>>>>>>>>>>>>    table metadata location in view metadata or by tracking both 
>>>>>>>>>>>>>>> in the same
>>>>>>>>>>>>>>>    catalog reference.
>>>>>>>>>>>>>>>    - *A new metadata type*: this option is where we define
>>>>>>>>>>>>>>>    a new metadata object that has view attributes, like SQL 
>>>>>>>>>>>>>>> representations,
>>>>>>>>>>>>>>>    along with table attributes, like partition specs and 
>>>>>>>>>>>>>>> snapshots.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hopefully this is clear because I think much of the
>>>>>>>>>>>>>>> confusion is caused by different definitions.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The LoadTableResponse having optional metadata-location
>>>>>>>>>>>>>>> field implies that the object in the catalog no longer needs to 
>>>>>>>>>>>>>>> hold a
>>>>>>>>>>>>>>> metadata file pointer
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The REST protocol has not removed the requirement for a
>>>>>>>>>>>>>>> metadata file, so I’m going to keep focused on the MV design 
>>>>>>>>>>>>>>> options.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> When we say a MV can be a “new metadata type”, it does not
>>>>>>>>>>>>>>> mean it needs to define a completely brand new structure of the 
>>>>>>>>>>>>>>> metadata
>>>>>>>>>>>>>>> content
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I’m making a distinction between separate metadata files for
>>>>>>>>>>>>>>> the table and the view and a combined metadata object, as above.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> We can define an “Iceberg MV” to be an object in a catalog,
>>>>>>>>>>>>>>> which has 1 table metadata file pointer, and 1 view metadata 
>>>>>>>>>>>>>>> file pointer
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> This is the option I am referring to as a “combination of a
>>>>>>>>>>>>>>> view and a table”.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> So to review my initial email, I don’t see a reason why a
>>>>>>>>>>>>>>> combined view and table is advantageous, either implemented by 
>>>>>>>>>>>>>>> having a
>>>>>>>>>>>>>>> catalog reference with two metadata locations or embedding a 
>>>>>>>>>>>>>>> table metadata
>>>>>>>>>>>>>>> location in view metadata. This would cause unnecessary 
>>>>>>>>>>>>>>> dependence between
>>>>>>>>>>>>>>> the view and table in catalogs. I guess there’s an argument 
>>>>>>>>>>>>>>> that you could
>>>>>>>>>>>>>>> load both table and view metadata locations at the same time. 
>>>>>>>>>>>>>>> That hardly
>>>>>>>>>>>>>>> seems worth the trouble given the recent issues with adding 
>>>>>>>>>>>>>>> views to the
>>>>>>>>>>>>>>> JDBC catalog.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I also think that once we decide on structure, we can make
>>>>>>>>>>>>>>> it possible for REST catalog implementations to do smart 
>>>>>>>>>>>>>>> things, in a way
>>>>>>>>>>>>>>> that doesn’t put additional requirements on the underlying 
>>>>>>>>>>>>>>> catalog store.
>>>>>>>>>>>>>>> For instance, we could specify how to send additional objects 
>>>>>>>>>>>>>>> in a
>>>>>>>>>>>>>>> LoadViewResult, in case the catalog wants to pre-fetch table 
>>>>>>>>>>>>>>> metadata. I
>>>>>>>>>>>>>>> think these optimizations are a later addition, after we define 
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> relationship between views and tables.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Jack, it sounds like you’re the proponent of a combined
>>>>>>>>>>>>>>> table and view (rather than a new metadata spec for a 
>>>>>>>>>>>>>>> materialized view).
>>>>>>>>>>>>>>> What is the main motivation? It seems like you’re convinced of 
>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>> approach, but I don’t understand the advantage it brings.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Ryan
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Thu, Feb 29, 2024 at 12:26 PM Szehon Ho <
>>>>>>>>>>>>>>> szehon.apa...@gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Yes I mostly agree with the assessment.  To clarify a few
>>>>>>>>>>>>>>>> minor points.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> is a materialized view a view and a separate table, a
>>>>>>>>>>>>>>>>> combination of the two (i.e. commits are combined), or a new 
>>>>>>>>>>>>>>>>> metadata type?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> For 'new metadata type', I consider mostly Jack's initial
>>>>>>>>>>>>>>>> proposal of a new Catalog MV object that has two references 
>>>>>>>>>>>>>>>> (ViewMetadata +
>>>>>>>>>>>>>>>> TableMetadata).
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The arguments that I see for a combined materialized view
>>>>>>>>>>>>>>>>> object are:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>    - Regular views are separate, rather than being tables
>>>>>>>>>>>>>>>>>    with SQL and no data so it would be inconsistent (“Iceberg 
>>>>>>>>>>>>>>>>> view is just a
>>>>>>>>>>>>>>>>>    table with no data but with representations defined. But 
>>>>>>>>>>>>>>>>> we did not do
>>>>>>>>>>>>>>>>>    that.”)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>    - Materialized views are different objects in DDL
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>    - Tables may be a superset of functionality needed for
>>>>>>>>>>>>>>>>>    materialized views
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>    - Tables are not typically exposed to end users — but
>>>>>>>>>>>>>>>>>    this isn’t required by the separate view and table option
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> For completeness, there seem to be a few additional ones
>>>>>>>>>>>>>>>> (mentioned in the Slack and above messages).
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>    - Lack of spec change (to ViewMetadata).  But as Jack
>>>>>>>>>>>>>>>>    says it is a spec change (ie, to catalogs)
>>>>>>>>>>>>>>>>    - A single call to get the View's StorageTable (versus
>>>>>>>>>>>>>>>>    two calls)
>>>>>>>>>>>>>>>>    - A more natural API, no opportunity for user to call
>>>>>>>>>>>>>>>>    Catalog.dropTable() and renameTable() on storage table
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> *Thoughts:  *I think the long discussion sessions we had
>>>>>>>>>>>>>>>> on Slack was fruitful for me, as seeing the API clarified some 
>>>>>>>>>>>>>>>> things.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I was initially more in favor of MV being a new metadata
>>>>>>>>>>>>>>>> type (TableMetadata + ViewMetadata).  But seeing most of the 
>>>>>>>>>>>>>>>> MV operations
>>>>>>>>>>>>>>>> end up being ViewCatalog or Catalog operations, I am starting 
>>>>>>>>>>>>>>>> to think
>>>>>>>>>>>>>>>> API-wise that it may not align with the new metadata type 
>>>>>>>>>>>>>>>> (unless we define
>>>>>>>>>>>>>>>> MVCatalog and /MV REST endpoints, which then are boilerplate 
>>>>>>>>>>>>>>>> wrappers).
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Initially one question I had for option 'a view and a
>>>>>>>>>>>>>>>> separate table', was how to make this table reference 
>>>>>>>>>>>>>>>> (metadata.json or
>>>>>>>>>>>>>>>> catalog reference).  In the previous option, we had a 
>>>>>>>>>>>>>>>> precedent of Catalog
>>>>>>>>>>>>>>>> references to Metadata, but not pointers between Metadatas.  I 
>>>>>>>>>>>>>>>> initially
>>>>>>>>>>>>>>>> saw the proposed Catalog's TableIdentifier pointer as 
>>>>>>>>>>>>>>>> 'polluting' catalog
>>>>>>>>>>>>>>>> concerns in ViewMetadata.  (I saw Catalog and ViewCatalog as a 
>>>>>>>>>>>>>>>> layer above
>>>>>>>>>>>>>>>> TableMetadata and ViewMetadata).  But I think Dan in the Slack 
>>>>>>>>>>>>>>>> made a fair
>>>>>>>>>>>>>>>> point that ViewMetadata already is tightly bound with a 
>>>>>>>>>>>>>>>> Catalog.  In this
>>>>>>>>>>>>>>>> case, I think this approach does have its merits as well in 
>>>>>>>>>>>>>>>> aligning
>>>>>>>>>>>>>>>> Catalog API's with the metadata.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>>>> Szehon
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Thu, Feb 29, 2024 at 5:45 AM Jan Kaul
>>>>>>>>>>>>>>>> <jank...@mailbox.org.invalid> <jank...@mailbox.org.invalid>
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I would like to provide my perspective on the question of
>>>>>>>>>>>>>>>>> what a materialized view is and elaborate on Jack's recent 
>>>>>>>>>>>>>>>>> proposal to view
>>>>>>>>>>>>>>>>> a materialized view as a catalog concept.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Firstly, let's look at the role of the catalog. Every
>>>>>>>>>>>>>>>>> entity in the catalog has a *unique identifier*, and the
>>>>>>>>>>>>>>>>> catalog provides methods to create, load, and update these 
>>>>>>>>>>>>>>>>> entities. An
>>>>>>>>>>>>>>>>> important thing to note is that the catalog methods exhibit 
>>>>>>>>>>>>>>>>> two different
>>>>>>>>>>>>>>>>> behaviors: the *create and load methods deal with the
>>>>>>>>>>>>>>>>> entire entity*, while the *update(commit) method only
>>>>>>>>>>>>>>>>> deals with partial changes* to the entities.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> In the context of our current discussion, materialized
>>>>>>>>>>>>>>>>> view (MV) metadata is a union of view and table metadata. The 
>>>>>>>>>>>>>>>>> fact that the
>>>>>>>>>>>>>>>>> update method deals only with partial changes, enables us to 
>>>>>>>>>>>>>>>>> *reuse
>>>>>>>>>>>>>>>>> the existing methods for updating tables and views*. For
>>>>>>>>>>>>>>>>> updates we don't have to define what constitutes an entire 
>>>>>>>>>>>>>>>>> materialized
>>>>>>>>>>>>>>>>> view. Changes to a materialized view targeting the properties 
>>>>>>>>>>>>>>>>> related to
>>>>>>>>>>>>>>>>> the view metadata could use the update(commit) view method. 
>>>>>>>>>>>>>>>>> Similarly,
>>>>>>>>>>>>>>>>> changes targeting the properties related to the table 
>>>>>>>>>>>>>>>>> metadata could use
>>>>>>>>>>>>>>>>> the update(commit) table method. This is great news because 
>>>>>>>>>>>>>>>>> we don't have
>>>>>>>>>>>>>>>>> to redefine view and table commits (requirements, updates).
>>>>>>>>>>>>>>>>> This is shown in the fact that Jack uses the same
>>>>>>>>>>>>>>>>> operation to update the storage table for Option 1 and 3:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> // REST: POST
>>>>>>>>>>>>>>>>> /namespaces/db1/tables/mv1?materializedView=true
>>>>>>>>>>>>>>>>> // non-REST: update JSON files at table_metadata_location
>>>>>>>>>>>>>>>>> storageTable.newAppend().appendFile(...).commit();
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> The open question is *whether the create and load methods
>>>>>>>>>>>>>>>>> should treat the properties that constitute the MV metadata 
>>>>>>>>>>>>>>>>> as two entities
>>>>>>>>>>>>>>>>> (View + Table) or one entity (new MV object)*. This is
>>>>>>>>>>>>>>>>> all part of Jack's proposal, where Option 1 proposes a new MV 
>>>>>>>>>>>>>>>>> object, and
>>>>>>>>>>>>>>>>> Option 3 proposes two separate entities. The advantage of 
>>>>>>>>>>>>>>>>> Option 1 is that
>>>>>>>>>>>>>>>>> it doesn't require two operations to load the metadata. On 
>>>>>>>>>>>>>>>>> the other hand,
>>>>>>>>>>>>>>>>> the advantage of Option 3 is that no new operations or 
>>>>>>>>>>>>>>>>> catalogs have to be
>>>>>>>>>>>>>>>>> defined.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> In my opinion, defining a new representation for
>>>>>>>>>>>>>>>>> materialized views (Option 1) is generally the cleaner 
>>>>>>>>>>>>>>>>> solution. However, I
>>>>>>>>>>>>>>>>> see a path where we could first introduce Option 3 and still 
>>>>>>>>>>>>>>>>> have the
>>>>>>>>>>>>>>>>> possibility to transition to Option 1 if needed. The great 
>>>>>>>>>>>>>>>>> thing about
>>>>>>>>>>>>>>>>> Option 3 is that it only requires minor changes to the 
>>>>>>>>>>>>>>>>> current spec and is
>>>>>>>>>>>>>>>>> mostly implementation detail.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Therefore I would propose small additions to Jacks Option
>>>>>>>>>>>>>>>>> 3 that only introduce changes to the spec that are not 
>>>>>>>>>>>>>>>>> specific to
>>>>>>>>>>>>>>>>> materialized views. The idea is to introduce boolean 
>>>>>>>>>>>>>>>>> properties to be set
>>>>>>>>>>>>>>>>> on the creation of the view and the storage table that 
>>>>>>>>>>>>>>>>> indicate that they
>>>>>>>>>>>>>>>>> belong to a materialized view. The view property 
>>>>>>>>>>>>>>>>> "materialized" is set to
>>>>>>>>>>>>>>>>> "true" for a MV and "false" for a regular view. And the table 
>>>>>>>>>>>>>>>>> property
>>>>>>>>>>>>>>>>> "storage_table" is set to "true" for a storage table and 
>>>>>>>>>>>>>>>>> "false" for a
>>>>>>>>>>>>>>>>> regular table. The absence of these properties indicates a 
>>>>>>>>>>>>>>>>> regular view or
>>>>>>>>>>>>>>>>> table.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> ViewCatalog viewCatalog = (ViewCatalog) catalog;
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> // REST: GET /namespaces/db1/views/mv1
>>>>>>>>>>>>>>>>> // non-REST: load JSON file at metadata_location
>>>>>>>>>>>>>>>>> View mv = viewCatalog.loadView(TableIdentifier.of("db1",
>>>>>>>>>>>>>>>>> "mv1"));
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> // REST: GET /namespaces/db1/tables/mv1
>>>>>>>>>>>>>>>>> // non-REST: load JSON file at table_metadata_location if
>>>>>>>>>>>>>>>>> present
>>>>>>>>>>>>>>>>> Table storageTable = view.storageTable();
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> // REST: POST /namespaces/db1/tables/mv1
>>>>>>>>>>>>>>>>> // non-REST: update JSON file at table_metadata_location
>>>>>>>>>>>>>>>>> storageTable.newAppend().appendFile(...).commit();
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> We could then introduce a new requirement for views and
>>>>>>>>>>>>>>>>> tables called "AssertProperty" which could make sure to only 
>>>>>>>>>>>>>>>>> perform
>>>>>>>>>>>>>>>>> updates that are inline with materialized views. The 
>>>>>>>>>>>>>>>>> additional requirement
>>>>>>>>>>>>>>>>> can be seen as a general extension which does not need to be 
>>>>>>>>>>>>>>>>> changed if we
>>>>>>>>>>>>>>>>> decide to got with Option 1 in the future.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Let me know what you think.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Best wishes,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Jan
>>>>>>>>>>>>>>>>> On 29.02.24 04:09, Walaa Eldin Moustafa wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks Ryan for the insights. I agree that reusing
>>>>>>>>>>>>>>>>> existing metadata definitions and minimizing spec changes are 
>>>>>>>>>>>>>>>>> very
>>>>>>>>>>>>>>>>> important. This also minimizes spec drift (between 
>>>>>>>>>>>>>>>>> materialized views and
>>>>>>>>>>>>>>>>> views spec, and between materialized views and tables spec), 
>>>>>>>>>>>>>>>>> and simplifies
>>>>>>>>>>>>>>>>> the implementation.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> In an effort to take the discussion forward with concrete
>>>>>>>>>>>>>>>>> design options based on an end-to-end implementation, I have 
>>>>>>>>>>>>>>>>> prototyped the
>>>>>>>>>>>>>>>>> implementation (and added Spark support) in this PR
>>>>>>>>>>>>>>>>> https://github.com/apache/iceberg/pull/9830. I hope it
>>>>>>>>>>>>>>>>> helps us reach convergence faster. More details about some of 
>>>>>>>>>>>>>>>>> the design
>>>>>>>>>>>>>>>>> options are discussed in the description of the PR.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>> Walaa.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Wed, Feb 28, 2024 at 6:20 PM Ryan Blue <b...@tabular.io>
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I mean separate table and view metadata that is somehow
>>>>>>>>>>>>>>>>>> combined through a commit process. For instance, keeping a 
>>>>>>>>>>>>>>>>>> pointer to a
>>>>>>>>>>>>>>>>>> table metadata file in a view metadata file or combining 
>>>>>>>>>>>>>>>>>> commits to
>>>>>>>>>>>>>>>>>> reference both. I don't see the value in either option.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Wed, Feb 28, 2024 at 5:05 PM Jack Ye <
>>>>>>>>>>>>>>>>>> yezhao...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thanks Ryan for the help to trace back to the root
>>>>>>>>>>>>>>>>>>> question! Just a clarification question regarding your 
>>>>>>>>>>>>>>>>>>> reply before I reply
>>>>>>>>>>>>>>>>>>> further: what exactly does the option "a combination of the 
>>>>>>>>>>>>>>>>>>> two (i.e.
>>>>>>>>>>>>>>>>>>> commits are combined)" mean? How is that different from "a 
>>>>>>>>>>>>>>>>>>> new metadata
>>>>>>>>>>>>>>>>>>> type"?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> -Jack
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>

Reply via email to