I’m catching up on this conversation, so hopefully I can bring a fresh
perspective.

Jack already pointed out that we need to start from the basics and I agree
with that. Let’s remove voting at this point. Right now is the time for
discussing trade-offs, not lining up and taking sides. I realize that
wasn’t the intent with adding a vote, but that’s almost always the result.
It’s too easy to use it as a stand-in for consensus and move on
prematurely. I get the impression from the swirl in Slack that discussion
has moved ahead of agreement.

We’re still at the most basic question: is a materialized view a view and a
separate table, a combination of the two (i.e. commits are combined), or a
new metadata type?

For now, I’m ignoring whether the “separate table” is some kind of “system
table” (meaning hidden?) or if it is exposed in the catalog. That’s a later
choice (already pointed out) and, I suspect, it should be delegated to
catalog implementations.

To simplify this a little, I think that we can eliminate the option to
combine table and view commits. I don’t think there is a reason to combine
the two. If separate, a table would track the view version used along with
freshness information for referenced tables. If the table is automatically
skipped when the version no longer matches the view, then no action needs
to happen when a view definition changes. Similarly, the table can be
updated independently without needing to also swap view metadata. This also
aligns with the idea from the original doc that there can be multiple
materialization tables for a view. Each should operate independently unless
I’m missing something

I don’t think the last paragraph’s conclusion is contentious so I’ll move
on, but please stop here and reply if you disagree!

That leaves the main two options, a view and a separate table linked by
metadata, or, combined materialized view metadata.

As the doc notes, the separate view and table option is simpler because it
reuses existing metadata definitions and falls back to simple views. That
is a significantly smaller spec and small is very, very important when it
comes to specs. I think that the argument for a new definition of a
materialized view needs to overcome this disadvantage.

The arguments that I see for a combined materialized view object are:

   - Regular views are separate, rather than being tables with SQL and no
   data so it would be inconsistent (“Iceberg view is just a table with no
   data but with representations defined. But we did not do that.”)
   - Materialized views are different objects in DDL
   - Tables may be a superset of functionality needed for materialized views
   - Tables are not typically exposed to end users — but this isn’t
   required by the separate view and table option

Am I missing any arguments for combined metadata?

Ryan
-- 
Ryan Blue
Tabular

Reply via email to