Re: Iceberg Materialized View Spec

Jan Kaul Thu, 22 Dec 2022 01:10:38 -0800

Hi Walaa,

as you pointed out, design 1 in the github issue<https://github.com/apache/iceberg/issues/6420> with a common view and alinked storage table seems to be the most promising going forward. Itherefore put together an initial proposal for a specification.

I realize that my proposal deviates from the design Trino is using atthe moment. I want to point out that this is just a proposal and is openfor discussion. The changes I suggest are twofold:


1. Change

Register only the view in the iceberg catalog. Currently the common viewas well as the storage table are registered in the catalog. To assureatomic transaction for the storage table I propose the following commitprocedure:


Writers changing the storage table state create table metadata files

optimistically, assuming that the current metadata location will not bechanged

before the writer’s commit. The commit is performed in two steps. First, the

Writer creates a new view metadata file optimistically and changes thestorage

table pointer to the new location. Second, the new view metadata file gets

committed by swapping the view’s metadata file pointer in the metastorefrom thebase location to its new location. The commit is only consideredsuccessful when the second

step succeeds.

2. Change

A different format for storing the refresh information. The proposedmetadata captures more information and allows easier additions in thefuture.

I have summarized the changes in the proposal at the end of the email. Iwould be glad if you could have a look.


Best wishes,

Jan


 Design 1


   Overview

MVs (Materialized views) are realized as a combination of an icebergcommon view

with an underlying storage table. The definition of the materialized view is

stored in the common view. The precomputed data is stored in an icebergtablecalled storage table. The information required for refresh operations isstored

as a property in the storage table. All changes to either the view or the

storage table state create a new view metadata file and completelyreplace theold view metadata file using an atomic swap. Like Iceberg tables andviews, thisatomic swap is delegated to the metastore that tracks tables and viewsby name.



     Metadata Location

An atomic swap of one view metadata file for another provides the basis for

making atomic changes. Readers use the version of the view that wascurrent whenthey loaded the view metadata and are not affected by changes until theyrefresh

and pick up a new metadata location.

Writers distinguish between changing the view or the storage table state.

Writers changing the view state create view metadata files optimistically,
assuming that the current metadata location will not be changed before the

writer’s commit. Once a writer has created an update, it commits byswapping the

view’s metadata file pointer from the base location to the new location.

Writers changing the storage table state create table metadata files

optimistically, assuming that the current metadata location will not bechanged

before the writer’s commit. The commit is performed in two steps. First, the

Writer creates a new view metadata file optimistically and changes thestorage

table pointer to the new location. Second, the new view metadata file gets

committed by swapping the view’s metadata file pointer in the metastorefrom thebase location to its new location. The commit is only successful whenthe second

step succeeds.


   Specification (DRAFT!)

The metadata of the materialized view is comprised of four parts. Theview andthe storage table metadata constitute one part each. Since not allinformationcan be stored inside the view and storage table metadata, two additionalpartsare introduced in the |properties| field of the view and storage tablemetadata

respectively.


     Materialized view metadata stored in the common view properties

One part of the materialized view metadata is stored inside the |properties|

field of the common view. The metadata is stored in JSON format underthe key“materialized_view_metadata”. The materialized view metadata stored inthe view

has the following schema.

v1      Field Name      Description

/required/ *|storage-table-location|* Path to the metadata file of thestorage table./optional/ *|allow-stale-data|* Boolean that defines the query enginebehavior in case the base tables indicate the precomputed data isn’tfresh. If set to FALSE, a refresh operation has to be performed beforethe query results are returned. If set to TRUE the data in the storagetable gets returned without performing a refresh operation. If field isnot set, defaults to FALSE./optional/ *|refresh-strategy|* Possible values are: |full|: Fullstorage table refresh, |incremental|: Incremental table refresh. Iffield is not set, defaults to |full|



     Materialized view metadata stored in the storage table properties

Another part of the materialized view metadata is stored inside the|properties|field of the storage table. The metadata is stored in JSON format underthe key

“materialized_view_metadata”. The materialized view metadata stored in the
storage table has the following schema.

v1      Field Name      Description
/required/      *|refreshes|*   A list of refresh operations.

/required/ *|current-refresh-id|* Id of the last refresh operationthat defines the current state of the data files.



       *Refreshes*

Refresh information is stored as a list of |refresh operation| records. Each
|refresh operation| has the following structure:

v1      Field Name      Description
/required/      *|refresh-id|*  ID of the refresh operation.

/required/ *|version-id|* Version id of the materialized view when therefresh operation was performed.

/required/      *|base-tables|*         A List of |base-table| records.

/optional/ *|sequence-number|* Sequence number of the snapshot thatcontains the refreshed data files.

Refreshes could be handled in different ways. For a normal execution therefreshlist could consist of only one entry, which gets overwritted on everyrefresh

operation. If “timetravel” is enabled for the materialized view, a new
|refresh operation| record gets inserted into the list on every refresh.
Together with the |sequence-number| field, this could be used to track the
evolution of data files over the refresh history.


       *Base table*

A |base table| record can have different forms based on the common field“type”.

The other fields don’t necessarily have to be the same.


       Iceberg-Metastore

v1      Field Name      Description
/required/      *|type|*        type=”iceberg-metastore”
/required/      *|identifier|*  Identifier in the SQL expression.

/required/ *|snapshot-reference|* Snapshot id of the base table whenthe refresh operation was performed./optional/ *|properties|* A string to string map of base tableproperties. Could be used to specify a different metastore.



       Iceberg-FileSystem

v1      Field Name      Description
/required/      *|type|*        type=”iceberg-filesystem”
/required/      *|identifier|*  Identifier in the SQL expression.
/required/      *|location|*    Path to the directory of the base table.



       DeltaLake-FileSystem (optional)

v1      Field Name      Description
/required/      *|type|*        type=”deltalake-filesystem”
/required/      *|identifier|*  Identifier in the SQL expression.
/required/      *|location|*    Path to the directory of the base table.

/required/ *|snapshot-reference|* Delta table version of the basetable when the refresh operation was performed./optional/ *|properties|* A string to string map of base tableproperties. Could be used for a different storage system.

Re: Iceberg Materialized View Spec

Reply via email to