I mean separate table and view metadata that is somehow combined through a commit process. For instance, keeping a pointer to a table metadata file in a view metadata file or combining commits to reference both. I don't see the value in either option.
On Wed, Feb 28, 2024 at 5:05 PM Jack Ye <yezhao...@gmail.com> wrote: > Thanks Ryan for the help to trace back to the root question! Just a > clarification question regarding your reply before I reply further: what > exactly does the option "a combination of the two (i.e. commits are > combined)" mean? How is that different from "a new metadata type"? > > -Jack > > > > > On Wed, Feb 28, 2024 at 2:10 PM Ryan Blue <b...@tabular.io> wrote: > >> I’m catching up on this conversation, so hopefully I can bring a fresh >> perspective. >> >> Jack already pointed out that we need to start from the basics and I >> agree with that. Let’s remove voting at this point. Right now is the time >> for discussing trade-offs, not lining up and taking sides. I realize that >> wasn’t the intent with adding a vote, but that’s almost always the result. >> It’s too easy to use it as a stand-in for consensus and move on >> prematurely. I get the impression from the swirl in Slack that discussion >> has moved ahead of agreement. >> >> We’re still at the most basic question: is a materialized view a view and >> a separate table, a combination of the two (i.e. commits are combined), or >> a new metadata type? >> >> For now, I’m ignoring whether the “separate table” is some kind of >> “system table” (meaning hidden?) or if it is exposed in the catalog. That’s >> a later choice (already pointed out) and, I suspect, it should be delegated >> to catalog implementations. >> >> To simplify this a little, I think that we can eliminate the option to >> combine table and view commits. I don’t think there is a reason to combine >> the two. If separate, a table would track the view version used along with >> freshness information for referenced tables. If the table is automatically >> skipped when the version no longer matches the view, then no action needs >> to happen when a view definition changes. Similarly, the table can be >> updated independently without needing to also swap view metadata. This also >> aligns with the idea from the original doc that there can be multiple >> materialization tables for a view. Each should operate independently unless >> I’m missing something >> >> I don’t think the last paragraph’s conclusion is contentious so I’ll move >> on, but please stop here and reply if you disagree! >> >> That leaves the main two options, a view and a separate table linked by >> metadata, or, combined materialized view metadata. >> >> As the doc notes, the separate view and table option is simpler because >> it reuses existing metadata definitions and falls back to simple views. >> That is a significantly smaller spec and small is very, very important when >> it comes to specs. I think that the argument for a new definition of a >> materialized view needs to overcome this disadvantage. >> >> The arguments that I see for a combined materialized view object are: >> >> - Regular views are separate, rather than being tables with SQL and >> no data so it would be inconsistent (“Iceberg view is just a table with no >> data but with representations defined. But we did not do that.”) >> - Materialized views are different objects in DDL >> - Tables may be a superset of functionality needed for materialized >> views >> - Tables are not typically exposed to end users — but this isn’t >> required by the separate view and table option >> >> Am I missing any arguments for combined metadata? >> >> Ryan >> -- >> Ryan Blue >> Tabular >> > -- Ryan Blue Tabular