Let me paraphrase the use case to make sure I'm getting it right: The idea
is to be able to remove expired data and delete the data files associated
with it, but without losing the history of other changes to the table.
Because new data and old data are modified in the same linear history,
physical
>
> @Szehon, I am wondering if we can create materialized views for metadata
> tables to support infinite history on metadata tables (like snapshots or
> partitions). Obviously, materialized views can't be used for time travel or
> rollback. They are only meant for maintaining long/infinite histori
> the main use case I had was table historical analysis (last update time
for each partitions, how many snapshots did this table ever have, for
example),
Partition level stats can probably help with questions like "last update
time for each partition".
@Szehon, I am wondering if we can create mat
Yea, for the original use case in this thread, agree it's delete (soft) +
expire (physical, permanent).
I guess I should have phrased my thought better, I was replying to Ryan's
question above
> We don't often have people ask to keep snapshots that can't be read
and had thought it'd be nice to
I think "soft-mode" is really just doing the delete. You can then recover
the snapshot if you happen to have accidentally TTL'd a partition.
On Fri, Jun 2, 2023 at 8:51 AM Szehon Ho wrote:
> I think this violates Iceberg’s assumption of immutable snapshots. That
> would require modifying the ol
I think this violates Iceberg’s assumption of immutable snapshots. That
would require modifying the old snapshot to no longer point to those gc’ed
data files, else not sure how you can time-travel to read from that
snapshot, if some of its files are deleted?
That being said, I also had this thoug
Ryan,
One use case is the user might need to time travel to a certain snapshot.
However, such a snapshot is expired due to the snapshot expiration
that only retains the latest snapshot operation, and this operation's only
intent is to remove the gc partition. It seems a little overkill to me.
I h
Pucheng,
What is the use case around keeping the snapshot longer? We don't often
have people ask to keep snapshots that can't be read, so it sounds like you
might have something specific in mind?
Ryan
On Wed, May 31, 2023 at 8:19 PM Pucheng Yang
wrote:
> Hi community,
>
> In my organization, a
Hi community,
In my organization, a big portion of the datasets are partitioned by date,
normally we keep the latest X dates of partition for a given dataset.
One issue that always bothers me is if I want to delete a partition
that should be GC, I will run SQL query "delete from tbl where dt = ..