Thank you for your comments. I should have provided a user story to make
the use case more clear.
While the WAP pattern is probably the most common usage for the
branching feature of iceberg tables, it could also be used in different
ways. The following is a user story showcasing the branching for views
and iceberg tables without the WAP pattern.
User story:
There is an existing data pipeline that ingests data from an operational
system into an iceberg table called "table_staging". The table
"table_staging" is used in the query definition of a view called
"view_cleaned". A data engineer is given the task to drop a column of
the table "table_staging". They would like perform the task in a
development environment where they can test whether dependent entities
are affected by the change to the table. Therefore they create a branch
of the table "table_staging" where they drop the column and a branch of
the view "view_cleaned" where they adjust the query definition such that
everything is working as expected.
I'm not sure if this kind of usage was envisioned when table branching
was added to iceberg tables, but it is possible.
Thanks,
Jan
On 14.11.23 09:04, Walaa Eldin Moustafa wrote:
Also, view metadata versions and (underlying) table snapshots/versions
are orthogonal concepts. For example, theoretically, one could
time-travel in views along two dimensions: view metadata version and
underlying data version. Hence, I do not think that data versioning in
tables corresponds exactly to view metadata versioning. Instead of
mapping/porting the feature from tables to views, we can approach this
by discussing the use case we are trying to unlock with this proposal.
Maybe there is a better way to support the use case.
Thanks,
Walaa.
On Mon, Nov 13, 2023 at 11:47 PM Ajantha Bhat <ajanthab...@gmail.com>
wrote:
Hi Jan,
In my view, branches are primarily intended for isolating tests
and later merging them back (commonly referred to as the WAP
scenario).
Tags, conversely, serve the purpose of marking significant
snapshots for reproducibility or auditing.
Views essentially act as a shorthand for queries. Creating or
replacing a view is a metadata operation with no data involvement.
So, branching and tagging support may be an overkill.
However, when dealing with materialized views, it becomes crucial
to support branching and tagging, given that these operations
involve data manipulation.
Thanks,
Ajantha