I mean separate table and view metadata that is somehow combined through a
commit process. For instance, keeping a pointer to a table metadata file in
a view metadata file or combining commits to reference both. I don't see
the value in either option.

On Wed, Feb 28, 2024 at 5:05 PM Jack Ye <yezhao...@gmail.com> wrote:

> Thanks Ryan for the help to trace back to the root question! Just a
> clarification question regarding your reply before I reply further: what
> exactly does the option "a combination of the two (i.e. commits are
> combined)" mean? How is that different from "a new metadata type"?
>
> -Jack
>
>
>
>
> On Wed, Feb 28, 2024 at 2:10 PM Ryan Blue <b...@tabular.io> wrote:
>
>> I’m catching up on this conversation, so hopefully I can bring a fresh
>> perspective.
>>
>> Jack already pointed out that we need to start from the basics and I
>> agree with that. Let’s remove voting at this point. Right now is the time
>> for discussing trade-offs, not lining up and taking sides. I realize that
>> wasn’t the intent with adding a vote, but that’s almost always the result.
>> It’s too easy to use it as a stand-in for consensus and move on
>> prematurely. I get the impression from the swirl in Slack that discussion
>> has moved ahead of agreement.
>>
>> We’re still at the most basic question: is a materialized view a view and
>> a separate table, a combination of the two (i.e. commits are combined), or
>> a new metadata type?
>>
>> For now, I’m ignoring whether the “separate table” is some kind of
>> “system table” (meaning hidden?) or if it is exposed in the catalog. That’s
>> a later choice (already pointed out) and, I suspect, it should be delegated
>> to catalog implementations.
>>
>> To simplify this a little, I think that we can eliminate the option to
>> combine table and view commits. I don’t think there is a reason to combine
>> the two. If separate, a table would track the view version used along with
>> freshness information for referenced tables. If the table is automatically
>> skipped when the version no longer matches the view, then no action needs
>> to happen when a view definition changes. Similarly, the table can be
>> updated independently without needing to also swap view metadata. This also
>> aligns with the idea from the original doc that there can be multiple
>> materialization tables for a view. Each should operate independently unless
>> I’m missing something
>>
>> I don’t think the last paragraph’s conclusion is contentious so I’ll move
>> on, but please stop here and reply if you disagree!
>>
>> That leaves the main two options, a view and a separate table linked by
>> metadata, or, combined materialized view metadata.
>>
>> As the doc notes, the separate view and table option is simpler because
>> it reuses existing metadata definitions and falls back to simple views.
>> That is a significantly smaller spec and small is very, very important when
>> it comes to specs. I think that the argument for a new definition of a
>> materialized view needs to overcome this disadvantage.
>>
>> The arguments that I see for a combined materialized view object are:
>>
>>    - Regular views are separate, rather than being tables with SQL and
>>    no data so it would be inconsistent (“Iceberg view is just a table with no
>>    data but with representations defined. But we did not do that.”)
>>    - Materialized views are different objects in DDL
>>    - Tables may be a superset of functionality needed for materialized
>>    views
>>    - Tables are not typically exposed to end users — but this isn’t
>>    required by the separate view and table option
>>
>> Am I missing any arguments for combined metadata?
>>
>> Ryan
>> --
>> Ryan Blue
>> Tabular
>>
>

-- 
Ryan Blue
Tabular

Reply via email to