>
> An explicit timestamp column adds more burden to application developers.
> While some databases require an explicit column in the schema, those
> databases provide triggers to auto set the column value. For Iceberg, the
> snapshot timestamp is the closest to the trigger timestamp.


I wonder if we should be looking at maybe generalizing the audit column in
Iceberg and letting this be configured at a table level.  Other common
common audit fields that some people might want without keeping snapshot
history:

1.  insertion time
2.  Created by.
3.  Updated by.




On Tue, Dec 9, 2025 at 2:23 PM Steven Wu <[email protected]> wrote:

> Ryan, thanks a lot for the feedback!
>
> Regarding the concern for reliable timestamps, we are not proposing using
> timestamps for ordering. With NTP in modern computers, they are generally
> reliable enough for the intended use cases. Also some environments may have
> stronger clock service, like Spanner TrueTime service
> <https://docs.cloud.google.com/spanner/docs/true-time-external-consistency>
> .
>
> >  joining to timestamps from the snapshots metadata table.
>
> As you also mentioned, it depends on the snapshot history, which is often
> retained for a few days due to performance reasons.
>
> > embedding a timestamp in DML (like `current_timestamp`) rather than
> relying on an implicit one from table metadata.
>
> An explicit timestamp column adds more burden to application developers.
> While some databases require an explicit column in the schema, those
> databases provide triggers to auto set the column value. For Iceberg, the
> snapshot timestamp is the closest to the trigger timestamp.
>
> Also, the timestamp set during computation (like streaming ingestion or
> relative long batch computation) doesn't capture the time the rows/files
> are added to the Iceberg table in a batch fashion.
>
> > And for those use cases, you could also keep a longer history of
> snapshot timestamps, like storing a catalog's event log for long-term
> access to timestamp info
>
> this is not really consumable by joining the regular table query with
> catalog event log. I would also imagine catalog event log is capped at
> shorter retention (maybe a few months) compared to data retention (could be
> a few years).
>
>
>
> On Tue, Dec 9, 2025 at 1:32 PM Ryan Blue <[email protected]> wrote:
>
>> I don't think it is a good idea to expose timestamps at the row level.
>> Timestamps in metadata that would be carried down to the row level already
>> confuse people that expect them to be useful or reliable, rather than for
>> debugging. I think extending this to the row level would only make the
>> problem worse.
>>
>> You can already get this information by projecting the last updated
>> sequence number, which is reliable, and joining to timestamps from the
>> snapshots metadata table. Of course, the drawback there is losing the
>> timestamp information when snapshots expire, but since it isn't reliable
>> anyway I'd be fine with that.
>>
>> Some of the use cases, like auditing and compliance, are probably better
>> served by embedding a timestamp in DML (like `current_timestamp`) rather
>> than relying on an implicit one from table metadata. And for those use
>> cases, you could also keep a longer history of snapshot timestamps, like
>> storing a catalog's event log for long-term access to timestamp info. I
>> think that would be better than storing it at the row level.
>>
>> On Mon, Dec 8, 2025 at 3:46 PM Steven Wu <[email protected]> wrote:
>>
>>> Hi,
>>>
>>> For V4 spec, I have a small proposal [1] to expose the row timestamp
>>> concept that can help with many use cases like temporal queries, latency
>>> tracking, TTL, auditing and compliance.
>>>
>>> This *_last_updated_timestamp_ms * metadata column behaves very
>>> similarly to the *_last_updated_sequence_number* for row lineage.
>>>
>>>    - Initially, it inherits from the snapshot timestamp.
>>>    - During rewrite (like compaction), its values are persisted in the
>>>    data files.
>>>
>>> Would love to hear what you think.
>>>
>>> Thanks,
>>> Steven
>>>
>>> [1]
>>> https://docs.google.com/document/d/1cXr_RwEO6o66S8vR7k3NM8-bJ9tH2rkh4vSdMXNC8J8/edit?usp=sharing
>>>
>>>
>>>

Reply via email to