Re: [DISCUSS] row timestamp proposal

Szehon Ho Thu, 11 Dec 2025 10:12:57 -0800

>
> this is not really consumable by joining the regular table query with
> catalog event log. I would also imagine catalog event log is capped at
> shorter retention (maybe a few months) compared to data retention (could be
> a few years).



This part of the discussion reminds of my proposal awhile back to keep
expired snapshot metadata :
https://lists.apache.org/thread/l9m0mp44byx9kzzzmolxnrdqlzbympb8
<https://lists.apache.org/thread/l9m0mp44byx9kzzzmolxnrdqlzbympb8> I think
it would have been nice to have this formally in Iceberg (a list of expired
snapshots) but the discussion concluded that its probably easier for the
catalog to do.  Maybe we could formalize it in the catalog spec that allows
for optional list of expired snapshots?

Is there also a concern about storage, by adding so many metadata fields to
each row?  It does seem that if we can calculate it by existing fields like
last_updated_sequence_number and joining with catalog information, it seems
better for the spec.

Thanks
Szehon

On Thu, Dec 11, 2025 at 10:02 AM Micah Kornfield <[email protected]>
wrote:

> An explicit timestamp column adds more burden to application developers.
>> While some databases require an explicit column in the schema, those
>> databases provide triggers to auto set the column value. For Iceberg, the
>> snapshot timestamp is the closest to the trigger timestamp.
>
>
> I wonder if we should be looking at maybe generalizing the audit column in
> Iceberg and letting this be configured at a table level.  Other common
> common audit fields that some people might want without keeping snapshot
> history:
>
> 1.  insertion time
> 2.  Created by.
> 3.  Updated by.
>
>
>
>
> On Tue, Dec 9, 2025 at 2:23 PM Steven Wu <[email protected]> wrote:
>
>> Ryan, thanks a lot for the feedback!
>>
>> Regarding the concern for reliable timestamps, we are not proposing using
>> timestamps for ordering. With NTP in modern computers, they are generally
>> reliable enough for the intended use cases. Also some environments may have
>> stronger clock service, like Spanner TrueTime service
>> <https://docs.cloud.google.com/spanner/docs/true-time-external-consistency>
>> .
>>
>> >  joining to timestamps from the snapshots metadata table.
>>
>> As you also mentioned, it depends on the snapshot history, which is often
>> retained for a few days due to performance reasons.
>>
>> > embedding a timestamp in DML (like `current_timestamp`) rather than
>> relying on an implicit one from table metadata.
>>
>> An explicit timestamp column adds more burden to application developers.
>> While some databases require an explicit column in the schema, those
>> databases provide triggers to auto set the column value. For Iceberg, the
>> snapshot timestamp is the closest to the trigger timestamp.
>>
>> Also, the timestamp set during computation (like streaming ingestion or
>> relative long batch computation) doesn't capture the time the rows/files
>> are added to the Iceberg table in a batch fashion.
>>
>> > And for those use cases, you could also keep a longer history of
>> snapshot timestamps, like storing a catalog's event log for long-term
>> access to timestamp info
>>
>> this is not really consumable by joining the regular table query with
>> catalog event log. I would also imagine catalog event log is capped at
>> shorter retention (maybe a few months) compared to data retention (could be
>> a few years).
>>
>>
>>
>> On Tue, Dec 9, 2025 at 1:32 PM Ryan Blue <[email protected]> wrote:
>>
>>> I don't think it is a good idea to expose timestamps at the row level.
>>> Timestamps in metadata that would be carried down to the row level already
>>> confuse people that expect them to be useful or reliable, rather than for
>>> debugging. I think extending this to the row level would only make the
>>> problem worse.
>>>
>>> You can already get this information by projecting the last updated
>>> sequence number, which is reliable, and joining to timestamps from the
>>> snapshots metadata table. Of course, the drawback there is losing the
>>> timestamp information when snapshots expire, but since it isn't reliable
>>> anyway I'd be fine with that.
>>>
>>> Some of the use cases, like auditing and compliance, are probably better
>>> served by embedding a timestamp in DML (like `current_timestamp`) rather
>>> than relying on an implicit one from table metadata. And for those use
>>> cases, you could also keep a longer history of snapshot timestamps, like
>>> storing a catalog's event log for long-term access to timestamp info. I
>>> think that would be better than storing it at the row level.
>>>
>>> On Mon, Dec 8, 2025 at 3:46 PM Steven Wu <[email protected]> wrote:
>>>
>>>> Hi,
>>>>
>>>> For V4 spec, I have a small proposal [1] to expose the row timestamp
>>>> concept that can help with many use cases like temporal queries, latency
>>>> tracking, TTL, auditing and compliance.
>>>>
>>>> This *_last_updated_timestamp_ms * metadata column behaves very
>>>> similarly to the *_last_updated_sequence_number* for row lineage.
>>>>
>>>>    - Initially, it inherits from the snapshot timestamp.
>>>>    - During rewrite (like compaction), its values are persisted in the
>>>>    data files.
>>>>
>>>> Would love to hear what you think.
>>>>
>>>> Thanks,
>>>> Steven
>>>>
>>>> [1]
>>>> https://docs.google.com/document/d/1cXr_RwEO6o66S8vR7k3NM8-bJ9tH2rkh4vSdMXNC8J8/edit?usp=sharing
>>>>
>>>>
>>>>

Re: [DISCUSS] row timestamp proposal

Reply via email to