Let me paraphrase the use case to make sure I'm getting it right: The idea
is to be able to remove expired data and delete the data files associated
with it, but without losing the history of other changes to the table.
Because new data and old data are modified in the same linear history,
physically removing old data (via snapshot expiration) prevents you from
keeping history for the new data.

There are a few ways I can think of to work around this. I think what most
people do is remove data a few days ahead of time so that it doesn't need
to be physically removed immediately. That's the default behavior, which I
think isn't what you want in this case.

Another option is to just delete the expired data files immediately. You'd
still have metadata references to them, but those won't cause issues as
long as no one tries to read the files. Of course, that runs into issues
with full table operations, like `select * limit 10` where you could
accidentally try to access a deleted file that's still referenced.

Last, I think you could solve this with branching, while also keeping
overhead down. The idea is to create a branch for each version you actually
want to keep. That's probably like a daily branch so you don't keep every
version of the table. Then you can apply deletes to all of the historical
branches and keep just the latest snapshot for each branch. That allows you
to select the table states you want to keep and still delete within that
set of states. Deleting data would be a bit more difficult, but you would
probably be able to reuse the same metadata changes for all the deletes.

It sounds like the last option is probably the one that makes the most
sense for you. Customizing history is a great use for tagging and branching.

Ryan

On Sat, Jun 3, 2023 at 5:03 AM Szehon Ho <szehon.apa...@gmail.com> wrote:

> @Szehon, I am wondering if we can create materialized views for metadata
>> tables to support infinite history on metadata tables (like snapshots or
>> partitions). Obviously, materialized views can't be used for time travel or
>> rollback. They are only meant for maintaining long/infinite histories.
>
>
> Yea, that's a good idea, there's definitely options like building a tool
> outside Iceberg (dumped it from time to time to materialized view), or
> build a history-preserving catalog layer that saves old snapshot metadata,
> rather than building it in Iceberg spec itself to keep expired metadata
> files.
>
> Thanks
> Szehon
>
> On Sat, Jun 3, 2023 at 10:06 AM Steven Wu <stevenz...@gmail.com> wrote:
>
>> > the main use case I had was table historical analysis (last update time
>> for each partitions, how many snapshots did this table ever have, for
>> example),
>>
>> Partition level stats can probably help with questions like "last update
>> time for each partition".
>>
>> @Szehon, I am wondering if we can create materialized views for metadata
>> tables to support infinite history on metadata tables (like snapshots or
>> partitions). Obviously, materialized views can't be used for time travel or
>> rollback. They are only meant for maintaining long/infinite histories.
>>
>> > One use case is the user might need to time travel to a certain
>> snapshot. However, such a snapshot is expired due to the snapshot
>> expiration that only retains the latest snapshot operation, and this
>> operation's only intent is to remove the gc partition. It seems a little
>> overkill to me.
>>
>> @Pucheng, usually people keep Iceberg snapshot history (for time travel
>> or rollback) for a few days (like 7). Very long history can burden the
>> metadata system. tagging can extend the history with selective snapshots.
>>
>> It seems that you are saying that purging actions of old partitions are
>> creating new snapshots, which are taking up some space in the snapshot
>> history. But if snapshot expiration is time based (like 7 days), this
>> shouldn't be a problem, right?
>>
>> On Fri, Jun 2, 2023 at 6:17 PM Szehon Ho <szehon.apa...@gmail.com> wrote:
>>
>>> Yea, for the original use case in this thread, agree it's delete (soft)
>>> + expire (physical, permanent).
>>>
>>> I guess I should have phrased my thought better, I was replying to
>>> Ryan's question above
>>>
>>>>  We don't often have people ask to keep snapshots that can't be read
>>>
>>>
>>> and had thought it'd be nice to have a ExpireSnapshot mode where we
>>> keep older metadata for longer periods of time beyond physical expiration.
>>>
>>> But the main use case I had was table historical analysis (last update
>>> time for each partitions, how many snapshots did this table ever have, for
>>> example), it's more a nice-to-have and definitely not sure it is a very
>>> compelling use-case.  Another option I guess, is custom catalog can keep
>>> around these historical information.
>>>
>>> Thanks
>>> Szehon
>>>
>>> On Fri, Jun 2, 2023 at 10:28 PM Russell Spitzer <
>>> russell.spit...@gmail.com> wrote:
>>>
>>>> I think "soft-mode" is really just doing the delete. You can then
>>>> recover the snapshot if you happen to have accidentally TTL'd a partition.
>>>>
>>>> On Fri, Jun 2, 2023 at 8:51 AM Szehon Ho <szehon.apa...@gmail.com>
>>>> wrote:
>>>>
>>>>> I think this violates Iceberg’s assumption of immutable snapshots.
>>>>> That would require modifying the old snapshot to no longer point to those
>>>>> gc’ed data files, else not sure how you can time-travel to read from that
>>>>> snapshot, if some of its files are deleted?
>>>>>
>>>>> That being said, I also had this thought at some point, to keep
>>>>> snapshot info around longer.  I expect most organizations operate in a 
>>>>> mode
>>>>> where they expire snapshots after a few days, and reasonably expect any
>>>>> time-travel or snapshot-related operation (like CDC) to happen within this
>>>>> timeframe.   And of course, use tags to keep the snapshot from expiration.
>>>>>
>>>>> But there are some use-cases where keeping more snapshot metadata for
>>>>> a period longer than when it could be read could be interesting.  For
>>>>> example, if I want to know info about the snapshot that added each data
>>>>> file, we probably have lost most of those snapshot metadata as they were
>>>>> added long ago.  Example, the frequent ask to find each partition's last
>>>>> modified time, (in an earlier email thread).
>>>>>
>>>>> I haven't thought it completely through, but it crossed my mind that a
>>>>> ‘Soft’-mode of ExpireSnapshot may be useful, where we can delete data 
>>>>> files
>>>>> but just mark snapshot’s metadata files as expired without physically
>>>>> deleting them, and so retain the ability to answer these questions.  It
>>>>> could be done by adding ‘expired-snapshots’ list to metadata.json.  That
>>>>> being said, its a singular use case and not sure if anyone also has
>>>>> interest or other use-case?  It would add a bit of complexity.
>>>>>
>>>>> Thanks
>>>>> Szehon
>>>>> Szehon
>>>>>
>>>>> On Fri, Jun 2, 2023 at 7:12 AM Pucheng Yang
>>>>> <py...@pinterest.com.invalid> wrote:
>>>>>
>>>>>> Ryan,
>>>>>>
>>>>>> One use case is the user might need to time travel to a certain
>>>>>> snapshot. However, such a snapshot is expired due to the snapshot
>>>>>> expiration that only retains the latest snapshot operation, and this
>>>>>> operation's only intent is to remove the gc partition. It seems a little
>>>>>> overkill to me.
>>>>>>
>>>>>> I hope my explanation makes sense to you.
>>>>>>
>>>>>> On Thu, Jun 1, 2023 at 3:39 PM Ryan Blue <b...@tabular.io> wrote:
>>>>>>
>>>>>>> Pucheng,
>>>>>>>
>>>>>>> What is the use case around keeping the snapshot longer? We don't
>>>>>>> often have people ask to keep snapshots that can't be read, so it sounds
>>>>>>> like you might have something specific in mind?
>>>>>>>
>>>>>>> Ryan
>>>>>>>
>>>>>>> On Wed, May 31, 2023 at 8:19 PM Pucheng Yang
>>>>>>> <py...@pinterest.com.invalid> wrote:
>>>>>>>
>>>>>>>> Hi community,
>>>>>>>>
>>>>>>>> In my organization, a big portion of the datasets are partitioned
>>>>>>>> by date, normally we keep the latest X dates of partition for a given
>>>>>>>> dataset.
>>>>>>>>
>>>>>>>> One issue that always bothers me is if I want to delete a partition
>>>>>>>> that should be GC, I will run SQL query "delete from tbl where dt = 
>>>>>>>> ..."
>>>>>>>> and do snapshot expiration to keep the latest snapshot to make sure 
>>>>>>>> that
>>>>>>>> partition data is physically removed. However, the downside of this
>>>>>>>> approach is the table snapshot history will be completely lost..
>>>>>>>>
>>>>>>>> I wonder if anyone else in the community has the same pain point?
>>>>>>>> How do you solve this? I would love to understand if there is a 
>>>>>>>> solution to
>>>>>>>> this otherwise we can brainstorm if there is a way to solve this.
>>>>>>>>
>>>>>>>> Thanks!
>>>>>>>>
>>>>>>>> Pucheng
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Ryan Blue
>>>>>>> Tabular
>>>>>>>
>>>>>>

-- 
Ryan Blue
Tabular

Reply via email to