Re: [DISCUSS] row timestamp proposal

Micah Kornfield Tue, 12 May 2026 13:22:54 -0700

>
> #2 is more involved and should probably be done after the v4 metadata tree
> (spec <https://github.com/apache/iceberg/pull/16025> and impl
> <https://github.com/orgs/apache/projects/605/views/1>) is mostly
> complete, as we want to plumb inheritance through only for the v4 tables.



One thing that I think has changed since the initial proposal are column
level updates.  Have we considered the interaction between these two
features?

Thanks,
Micah

On Mon, May 11, 2026 at 6:30 PM Steven Wu <[email protected]> wrote:

> Circling back on this topic, since we have consensus on the direction. It
> essentially has two parts
>
>    1. monotonic snapshot timestamp for v4 tables
>    2. row timestamp inherited from snapshot timestamp for v4 tables
>
>
> #1 is an isolated and small change. So I created the following PRs:
> * spec: https://github.com/apache/iceberg/pull/16294
> * impl: https://github.com/apache/iceberg/pull/16293
>
> #2 is more involved and should probably be done after the v4 metadata tree
> (spec <https://github.com/apache/iceberg/pull/16025> and impl
> <https://github.com/orgs/apache/projects/605/views/1>) is mostly
> complete, as we want to plumb inheritance through only for the v4 tables.
>
>
>
> On Mon, Jan 26, 2026 at 10:05 AM Russell Spitzer <
> [email protected]> wrote:
>
>> Sounds good to me
>>
>> On Mon, Jan 26, 2026 at 11:59 AM Anton Okolnychyi <[email protected]>
>> wrote:
>>
>>> Cool, sounds like a plan then? Thanks for answering all the questions,
>>> Steven!
>>>
>>> чт, 22 січ. 2026 р. о 18:29 Steven Wu <[email protected]> пише:
>>>
>>>> For row timestamp inheritance to work, I would need to implement the
>>>> plumbing. So I would imagine existing rows would have null values because
>>>> the inheritance plumbing was not there yet. This would be consistent with
>>>> upgrade behavior for the V3 row lineage:
>>>> https://iceberg.apache.org/spec/#row-lineage-for-upgraded-tables.
>>>>
>>>> On Thu, Jan 22, 2026 at 4:09 PM Anton Okolnychyi <[email protected]>
>>>> wrote:
>>>>
>>>>> Also, do we have a concrete plan for how to handle tables that would
>>>>> be upgraded to V4? What timestamp will we assign to existing rows?
>>>>>
>>>>> On Wed, Jan 21, 2026 at 3:59 PM Anton Okolnychyi <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> If we ignore temporal queries that need strict snapshot boundaries
>>>>>> and can't be solved completely using row timestamps in case of mutations,
>>>>>> you mentioned other use cases when row timestamps may be helpful like TTL
>>>>>> and auditing. We can debate whether using CURRENT_TIMESTAMP() is enough 
>>>>>> for
>>>>>> them, but I don't really see a point given that we already have row 
>>>>>> lineage
>>>>>> in V3 and the storage overhead for one more field isn't likely to be
>>>>>> noticable. One of the problems with CURRENT_TIMESTAMP() is the required
>>>>>> action by the user. Having a reliable row timestamp populated 
>>>>>> automatically
>>>>>> is likely to be better, so +1.
>>>>>>
>>>>>> пт, 16 січ. 2026 р. о 14:30 Steven Wu <[email protected]> пише:
>>>>>>
>>>>>>> Joining with snapshot history also has significant complexity. It
>>>>>>> requires retaining the entire snapshot history with probably trimmed
>>>>>>> snapshot metadata. There are concerns on the size of the snapshot 
>>>>>>> history
>>>>>>> for tables with frequent commits (like streaming ingestion). Do we 
>>>>>>> maintain
>>>>>>> the unbounded trimmed snapshot history in the same table metadata, which
>>>>>>> could affect table metadata.json size? or store it separately somewhere
>>>>>>> (like in catalog), which would require the complexity of multi-entity
>>>>>>> transaction in catalog?
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Jan 16, 2026 at 12:07 PM Russell Spitzer <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> I've gone back and forth on the inherited columns. I think the
>>>>>>>> thing which keeps coming back to me is that I don't
>>>>>>>> like that the only way to determine the timestamp associated with a
>>>>>>>> row update/creation is to do a join back
>>>>>>>> against table metadata. While that's doable, It feels user
>>>>>>>> unfriendly.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Jan 16, 2026 at 11:54 AM Steven Wu <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Anton, you are right that the row-level deletes will be a problem
>>>>>>>>> for some of the mentioned use cases (like incremental processing). I 
>>>>>>>>> have
>>>>>>>>> clarified the applicability of some use cases to "tables with inserts 
>>>>>>>>> and
>>>>>>>>> updates only".
>>>>>>>>>
>>>>>>>>> Right now, we are only tracking modification/commit time (not
>>>>>>>>> insertion time) in case of updates.
>>>>>>>>>
>>>>>>>>> On Thu, Jan 15, 2026 at 6:33 PM Anton Okolnychyi <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> I think there is clear consensus that making snapshot timestamps
>>>>>>>>>> strictly increasing is a positive thing. I am also +1.
>>>>>>>>>>
>>>>>>>>>> - How will row timestamps allow us to reliably implement
>>>>>>>>>> incremental consumption independent of the snapshot retention given 
>>>>>>>>>> that
>>>>>>>>>> rows can be added AND removed in a particular time frame? How can we
>>>>>>>>>> capture all changes by just looking at the latest snapshot?
>>>>>>>>>> - Some use cases in the doc need the insertion time and some need
>>>>>>>>>> the last modification time. Do we plan to support both?
>>>>>>>>>> - What do we expect the behavior to be in UPDATE and MERGE
>>>>>>>>>> operations?
>>>>>>>>>>
>>>>>>>>>> To be clear: I am not opposed to this change, just want to make
>>>>>>>>>> sure I understand all use cases that we aim to address and what 
>>>>>>>>>> would be
>>>>>>>>>> required in engines.
>>>>>>>>>>
>>>>>>>>>> чт, 15 січ. 2026 р. о 17:01 Maninder Parmar <
>>>>>>>>>> [email protected]> пише:
>>>>>>>>>>
>>>>>>>>>>> +1 for improving how the commit timestamps are
>>>>>>>>>>> assigned monotonically since this requirement has emerged over 
>>>>>>>>>>> multiple
>>>>>>>>>>> discussions like notifications, multi-table transactions, time 
>>>>>>>>>>> travel
>>>>>>>>>>> accuracy and row timestamps. It would be good to have a single 
>>>>>>>>>>> consistent
>>>>>>>>>>> way to represent and assign timestamps that could be leveraged 
>>>>>>>>>>> across
>>>>>>>>>>> multiple features.
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Jan 15, 2026 at 4:05 PM Ryan Blue <[email protected]>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Yeah, to add my perspective on that discussion, I think my
>>>>>>>>>>>> primary concern is that people expect timestamps to be monotonic 
>>>>>>>>>>>> and if
>>>>>>>>>>>> they aren't then a `_last_update_timestamp` field just makes the 
>>>>>>>>>>>> problem
>>>>>>>>>>>> worse. But it is _nice_ to have row-level timestamps. So I would 
>>>>>>>>>>>> be okay if
>>>>>>>>>>>> we revisit how we assign commit timestamps and improve it so that 
>>>>>>>>>>>> you get
>>>>>>>>>>>> monotonic behavior.
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Jan 15, 2026 at 2:23 PM Steven Wu <[email protected]>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> We had an offline discussion with Ryan. I revised the proposal
>>>>>>>>>>>>> as follows.
>>>>>>>>>>>>>
>>>>>>>>>>>>> 1. V4 would require writers to generate *monotonic* snapshot
>>>>>>>>>>>>> timestamps. The proposal doc has a section that describes a 
>>>>>>>>>>>>> recommended
>>>>>>>>>>>>> implementation using lamport timestamps.
>>>>>>>>>>>>> 2. Expose *last_update_timestamp* metadata column that
>>>>>>>>>>>>> inherits from snapshot timestamp
>>>>>>>>>>>>>
>>>>>>>>>>>>> This is a relatively low-friction change that can fix the time
>>>>>>>>>>>>> travel problem and enable use cases like latency tracking, 
>>>>>>>>>>>>> temporal query,
>>>>>>>>>>>>> TTL, auditing.
>>>>>>>>>>>>>
>>>>>>>>>>>>> There is no accuracy requirement on the timestamp values. In
>>>>>>>>>>>>> practice, modern servers with NTP have pretty reliable wall 
>>>>>>>>>>>>> clocks. E.g.,
>>>>>>>>>>>>> Java library implemented this validation
>>>>>>>>>>>>> <https://github.com/apache/iceberg/blob/035e0fb39d2a949f6343552ade0a7d6c2967e0db/core/src/main/java/org/apache/iceberg/TableMetadata.java#L369-L377>
>>>>>>>>>>>>>  that
>>>>>>>>>>>>> protects against backward clock drift up to one minute for 
>>>>>>>>>>>>> snapshot
>>>>>>>>>>>>> timestamps. Don't think we have heard many complaints of commit 
>>>>>>>>>>>>> failure due
>>>>>>>>>>>>> to that clock drift validation.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Would appreciate feedback on the revised proposal.
>>>>>>>>>>>>>
>>>>>>>>>>>>> https://docs.google.com/document/d/1cXr_RwEO6o66S8vR7k3NM8-bJ9tH2rkh4vSdMXNC8J8/edit?tab=t.0
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Steven
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Jan 13, 2026 at 8:40 PM Anton Okolnychyi <
>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Steven, I was referring to the fact that CURRENT_TIMESTAMP()
>>>>>>>>>>>>>> is usually evaluated quite early in engines so we could 
>>>>>>>>>>>>>> theoretically have
>>>>>>>>>>>>>> another expression closer to the commit time. You are right, 
>>>>>>>>>>>>>> though, it
>>>>>>>>>>>>>> won't be the actual commit time given that we have to write it 
>>>>>>>>>>>>>> into the
>>>>>>>>>>>>>> files. Also, I don't think generating a timestamp for a row as 
>>>>>>>>>>>>>> it is being
>>>>>>>>>>>>>> written is going to be beneficial. To sum up, expression-based 
>>>>>>>>>>>>>> defaults
>>>>>>>>>>>>>> would allow us to capture the time the transaction or write 
>>>>>>>>>>>>>> starts, but not
>>>>>>>>>>>>>> the actual commit time.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Russell, if the goal is to know what happened to the table in
>>>>>>>>>>>>>> a given time frame, isn't the changelog scan the way to go? It 
>>>>>>>>>>>>>> would assign
>>>>>>>>>>>>>> commit ordinals based on lineage and include row-level diffs. 
>>>>>>>>>>>>>> How would you
>>>>>>>>>>>>>> be able to determine changes with row timestamps by just looking 
>>>>>>>>>>>>>> at the
>>>>>>>>>>>>>> latest snapshot?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> It does seem promising to make snapshot timestamps strictly
>>>>>>>>>>>>>> increasing to avoid ambiguity during time travel.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> вт, 13 січ. 2026 р. о 16:33 Ryan Blue <[email protected]>
>>>>>>>>>>>>>> пише:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> > Whether or not "t" is an atomic clock time is not as
>>>>>>>>>>>>>>> important as the query between time bounds making sense.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I'm not sure I get it then. If we want monotonically
>>>>>>>>>>>>>>> increasing times, but they don't have to be real times then how 
>>>>>>>>>>>>>>> do you know
>>>>>>>>>>>>>>> what notion of "time" you care about for these filters? Or to 
>>>>>>>>>>>>>>> put it
>>>>>>>>>>>>>>> another way, how do you know that your "before" and "after" 
>>>>>>>>>>>>>>> times are
>>>>>>>>>>>>>>> reasonable? If the boundaries of these time queries can move 
>>>>>>>>>>>>>>> around a bit,
>>>>>>>>>>>>>>> by how much?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> It seems to me that row IDs can play an important role here
>>>>>>>>>>>>>>> because you have the order guarantee that we seem to want for 
>>>>>>>>>>>>>>> this use
>>>>>>>>>>>>>>> case: if snapshot A was committed before snapshot B, then the 
>>>>>>>>>>>>>>> rows from A
>>>>>>>>>>>>>>> have row IDs that are always less than the rows IDs of B. The 
>>>>>>>>>>>>>>> problem is
>>>>>>>>>>>>>>> that we don't know where those row IDs start and end once A and 
>>>>>>>>>>>>>>> B are no
>>>>>>>>>>>>>>> longer tracked. Using a "timestamp" seems to work, but I still 
>>>>>>>>>>>>>>> worry that
>>>>>>>>>>>>>>> without reliable timestamps that correspond with some guarantee 
>>>>>>>>>>>>>>> to real
>>>>>>>>>>>>>>> timestamps, we are creating a feature that seems reliable but 
>>>>>>>>>>>>>>> isn't.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I'm somewhat open to the idea of introducing a snapshot
>>>>>>>>>>>>>>> timestamp that the catalog guarantees is monotonically 
>>>>>>>>>>>>>>> increasing. But if
>>>>>>>>>>>>>>> we did that, wouldn't we still need to know the association 
>>>>>>>>>>>>>>> between these
>>>>>>>>>>>>>>> timestamps and snapshots after the snapshot metadata expires? 
>>>>>>>>>>>>>>> My mental
>>>>>>>>>>>>>>> model is that this would be used to look for data that arrived, 
>>>>>>>>>>>>>>> say, 3
>>>>>>>>>>>>>>> weeks ago on Dec 24th. Since the snapshots metadata is no 
>>>>>>>>>>>>>>> longer around we
>>>>>>>>>>>>>>> could use the row timestamp to find those rows. But how do we 
>>>>>>>>>>>>>>> know that the
>>>>>>>>>>>>>>> snapshot timestamps correspond to the actual timestamp range of 
>>>>>>>>>>>>>>> Dec 24th?
>>>>>>>>>>>>>>> Is it just "close enough" as long as we don't have out of order 
>>>>>>>>>>>>>>> timestamps?
>>>>>>>>>>>>>>> This is what I mean by needing to keep track of the association 
>>>>>>>>>>>>>>> between
>>>>>>>>>>>>>>> timestamps and snapshots after the metadata expires. Seems like 
>>>>>>>>>>>>>>> you either
>>>>>>>>>>>>>>> need to keep track of what the catalog's clock was for events 
>>>>>>>>>>>>>>> you care
>>>>>>>>>>>>>>> about, or you don't really care about exact timestamps.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Tue, Jan 13, 2026 at 2:22 PM Russell Spitzer <
>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The key goal here is the ability to answer the question
>>>>>>>>>>>>>>>> "what happened to the table in some time window. (before < t < 
>>>>>>>>>>>>>>>> after)?"
>>>>>>>>>>>>>>>> Whether or not "t" is an atomic clock time is not as
>>>>>>>>>>>>>>>> important as the query between time bounds making sense.
>>>>>>>>>>>>>>>> Downstream applications (from what I know) are mostly
>>>>>>>>>>>>>>>> sensitive to getting discrete and well defined answers to
>>>>>>>>>>>>>>>> this question like:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 1 < t < 2 should be exclusive of
>>>>>>>>>>>>>>>> 2 < t < 3 should be exclusive of
>>>>>>>>>>>>>>>> 3 < t < 4
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> And the union of these should be the same as the query
>>>>>>>>>>>>>>>> asking for 1 < t < 4
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Currently this is not possible because we have no
>>>>>>>>>>>>>>>> guarantee of ordering in our timestamps
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Snapshots
>>>>>>>>>>>>>>>> A -> B -> C
>>>>>>>>>>>>>>>> Sequence numbers
>>>>>>>>>>>>>>>> 50 -> 51 ->  52
>>>>>>>>>>>>>>>> Timestamp
>>>>>>>>>>>>>>>> 3 -> 1 -> 2
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> This makes time travel always a little wrong to start with.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The Java implementation only allows one minute of negative
>>>>>>>>>>>>>>>> time on commit so we actually kind of do have this as a
>>>>>>>>>>>>>>>> "light monotonicity" requirement but as noted above there
>>>>>>>>>>>>>>>> is no spec requirement for this.  While we do have sequence
>>>>>>>>>>>>>>>> number and row id, we still don't have a stable way of
>>>>>>>>>>>>>>>> associating these with a consistent time in an engine 
>>>>>>>>>>>>>>>> independent way.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Ideally we just want to have one consistent way of
>>>>>>>>>>>>>>>> answering the question "what did the table look like at time t"
>>>>>>>>>>>>>>>> which I think we get by adding in a new field that is a
>>>>>>>>>>>>>>>> timestamp, set by the Catalog close to commit time,
>>>>>>>>>>>>>>>> that always goes up.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I'm not sure we can really do this with an engine
>>>>>>>>>>>>>>>> expression since they won't know when the data is actually 
>>>>>>>>>>>>>>>> committed
>>>>>>>>>>>>>>>> when writing files?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Tue, Jan 13, 2026 at 3:35 PM Anton Okolnychyi <
>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> This seems like a lot of new complexity in the format. I
>>>>>>>>>>>>>>>>> would like us to explore whether we can build the considered 
>>>>>>>>>>>>>>>>> use cases on
>>>>>>>>>>>>>>>>> top of expression-based defaults instead.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> We already plan to support CURRENT_TIMESTAMP() and similar
>>>>>>>>>>>>>>>>> functions that are part of the SQL standard definition for 
>>>>>>>>>>>>>>>>> default values.
>>>>>>>>>>>>>>>>> This would provide us a way to know the relative row order. 
>>>>>>>>>>>>>>>>> True, this
>>>>>>>>>>>>>>>>> usually will represent the start of the operation. We may 
>>>>>>>>>>>>>>>>> define
>>>>>>>>>>>>>>>>> COMMIT_TIMESTAMP() or a similar expression for the actual 
>>>>>>>>>>>>>>>>> commit time, if
>>>>>>>>>>>>>>>>> there are use cases that need that. Plus, we may explore an 
>>>>>>>>>>>>>>>>> approach
>>>>>>>>>>>>>>>>> similar to MySQL that allows users to reset the default value 
>>>>>>>>>>>>>>>>> on update.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> - Anton
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> вт, 13 січ. 2026 р. о 11:04 Russell Spitzer <
>>>>>>>>>>>>>>>>> [email protected]> пише:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I think this is the right step forward. Our current
>>>>>>>>>>>>>>>>>> "timestamp" definition is too ambiguous to be useful so 
>>>>>>>>>>>>>>>>>> establishing
>>>>>>>>>>>>>>>>>> a well defined and monotonic timestamp could be really
>>>>>>>>>>>>>>>>>> great. I also like the ability for row's to know this value 
>>>>>>>>>>>>>>>>>> without
>>>>>>>>>>>>>>>>>> having to rely on snapshot information which can be
>>>>>>>>>>>>>>>>>> expired.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Mon, Jan 12, 2026 at 11:03 AM Steven Wu <
>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I have revised the row timestamp proposal with the
>>>>>>>>>>>>>>>>>>> following changes.
>>>>>>>>>>>>>>>>>>> * a new commit_timestamp field in snapshot metadata that
>>>>>>>>>>>>>>>>>>> has nano-second precision.
>>>>>>>>>>>>>>>>>>> * this optional field is only set by the REST catalog
>>>>>>>>>>>>>>>>>>> server
>>>>>>>>>>>>>>>>>>> * it needs to be monotonic (e.g. implemented using
>>>>>>>>>>>>>>>>>>> Lamport timestamp)
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> https://docs.google.com/document/d/1cXr_RwEO6o66S8vR7k3NM8-bJ9tH2rkh4vSdMXNC8J8/edit?tab=t.0#heading=h.efdngoizchuh
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>> Steven
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Fri, Dec 12, 2025 at 2:36 PM Steven Wu <
>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Thanks for the clarification, Ryan.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> For long-running streaming jobs that commit
>>>>>>>>>>>>>>>>>>>> periodically, it is difficult to establish the constant 
>>>>>>>>>>>>>>>>>>>> value of
>>>>>>>>>>>>>>>>>>>> current_timestamp across all writer tasks for each commit 
>>>>>>>>>>>>>>>>>>>> cycle. I guess
>>>>>>>>>>>>>>>>>>>> streaming writers may just need to write the wall clock 
>>>>>>>>>>>>>>>>>>>> time when appending
>>>>>>>>>>>>>>>>>>>> a row to a data file for the default value of 
>>>>>>>>>>>>>>>>>>>> current_timestamp.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Fri, Dec 12, 2025 at 1:44 PM Ryan Blue <
>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> I don't think that every row would have a different
>>>>>>>>>>>>>>>>>>>>> value. That would be up to the engine, but I would expect 
>>>>>>>>>>>>>>>>>>>>> engines to insert
>>>>>>>>>>>>>>>>>>>>> `CURRENT_TIMESTAMP` into the plan and then replace it 
>>>>>>>>>>>>>>>>>>>>> with a constant,
>>>>>>>>>>>>>>>>>>>>> resulting in a consistent value for all rows.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> You're right that this would not necessarily be the
>>>>>>>>>>>>>>>>>>>>> commit time. But neither is the commit timestamp from 
>>>>>>>>>>>>>>>>>>>>> Iceberg's snapshot.
>>>>>>>>>>>>>>>>>>>>> I'm not sure how we are going to define "good enough" for 
>>>>>>>>>>>>>>>>>>>>> this purpose. I
>>>>>>>>>>>>>>>>>>>>> think at least `CURRENT_TIMESTAMP` has reliable and known 
>>>>>>>>>>>>>>>>>>>>> behavior when you
>>>>>>>>>>>>>>>>>>>>> look at how it is handled in engines. And if you want the 
>>>>>>>>>>>>>>>>>>>>> Iceberg
>>>>>>>>>>>>>>>>>>>>> timestamp, then use a periodic query of the snapshot 
>>>>>>>>>>>>>>>>>>>>> stable to keep track
>>>>>>>>>>>>>>>>>>>>> of them in a table you can join to. I don't think this 
>>>>>>>>>>>>>>>>>>>>> rises to the need
>>>>>>>>>>>>>>>>>>>>> for a table feature unless we can guarantee that it is 
>>>>>>>>>>>>>>>>>>>>> correct.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On Fri, Dec 12, 2025 at 1:19 PM Steven Wu <
>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> > Postgres `current_timestamp` captures the
>>>>>>>>>>>>>>>>>>>>>> transaction start time [1, 2]. Should we extend the same 
>>>>>>>>>>>>>>>>>>>>>> semantic to
>>>>>>>>>>>>>>>>>>>>>> Iceberg: all rows added in the same snapshot should have 
>>>>>>>>>>>>>>>>>>>>>> the same timestamp
>>>>>>>>>>>>>>>>>>>>>> value?
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Let me clarify my last comment.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> created_at TIMESTAMP WITH TIME ZONE DEFAULT
>>>>>>>>>>>>>>>>>>>>>> CURRENT_TIMESTAMP)
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Since Postgres current_timestamp captures the
>>>>>>>>>>>>>>>>>>>>>> transaction start time, all rows added in the same 
>>>>>>>>>>>>>>>>>>>>>> insert transaction would
>>>>>>>>>>>>>>>>>>>>>> have the same value as the transaction timestamp with 
>>>>>>>>>>>>>>>>>>>>>> the column
>>>>>>>>>>>>>>>>>>>>>> definition above.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> If we extend a similar semantic to Iceberg, all rows
>>>>>>>>>>>>>>>>>>>>>> added in the same Iceberg transaction/snapshot should 
>>>>>>>>>>>>>>>>>>>>>> have the same
>>>>>>>>>>>>>>>>>>>>>> timestamp?
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Ryan, I understand your comment for using
>>>>>>>>>>>>>>>>>>>>>> current_timestamp expression as column default value, 
>>>>>>>>>>>>>>>>>>>>>> you were thinking
>>>>>>>>>>>>>>>>>>>>>> that the engine would set the column value to the wall 
>>>>>>>>>>>>>>>>>>>>>> clock time when
>>>>>>>>>>>>>>>>>>>>>> appending a row to a data file, right? every row would 
>>>>>>>>>>>>>>>>>>>>>> almost have a
>>>>>>>>>>>>>>>>>>>>>> different timestamp value.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> On Fri, Dec 12, 2025 at 10:26 AM Steven Wu <
>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> `current_timestamp` expression may not always carry
>>>>>>>>>>>>>>>>>>>>>>> the right semantic for the use cases. E.g., latency 
>>>>>>>>>>>>>>>>>>>>>>> tracking is interested
>>>>>>>>>>>>>>>>>>>>>>> in when records are added / committed to the table, not 
>>>>>>>>>>>>>>>>>>>>>>> when the record was
>>>>>>>>>>>>>>>>>>>>>>> appended to an uncommitted data file in the processing 
>>>>>>>>>>>>>>>>>>>>>>> engine.
>>>>>>>>>>>>>>>>>>>>>>> Record creation and Iceberg commit can be minutes or 
>>>>>>>>>>>>>>>>>>>>>>> even hours apart.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Row timestamp inherited from snapshot timestamp has
>>>>>>>>>>>>>>>>>>>>>>> no overhead with the initial commit and has very 
>>>>>>>>>>>>>>>>>>>>>>> minimal storage overhead
>>>>>>>>>>>>>>>>>>>>>>> during file rewrite. Per-row current_timestamp would 
>>>>>>>>>>>>>>>>>>>>>>> have distinct values
>>>>>>>>>>>>>>>>>>>>>>> for every row and has more storage overhead.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> OLTP databases deal with small row-level
>>>>>>>>>>>>>>>>>>>>>>> transactions. Postgres `current_timestamp` captures the 
>>>>>>>>>>>>>>>>>>>>>>> transaction start
>>>>>>>>>>>>>>>>>>>>>>> time [1, 2]. Should we extend the same semantic to 
>>>>>>>>>>>>>>>>>>>>>>> Iceberg: all rows added
>>>>>>>>>>>>>>>>>>>>>>> in the same snapshot should have the same timestamp 
>>>>>>>>>>>>>>>>>>>>>>> value?
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>>>>>>>>> https://www.postgresql.org/docs/current/functions-datetime.html
>>>>>>>>>>>>>>>>>>>>>>> [2]
>>>>>>>>>>>>>>>>>>>>>>> https://neon.com/postgresql/postgresql-date-functions/postgresql-current_timestamp
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> On Thu, Dec 11, 2025 at 4:07 PM Micah Kornfield <
>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Micah, are 1 and 2 the same? 3 is covered by this
>>>>>>>>>>>>>>>>>>>>>>>>> proposal.
>>>>>>>>>>>>>>>>>>>>>>>>> To support the created_by timestamp, we would need
>>>>>>>>>>>>>>>>>>>>>>>>> to implement the following row lineage behavior
>>>>>>>>>>>>>>>>>>>>>>>>> * Initially, it inherits from the snapshot
>>>>>>>>>>>>>>>>>>>>>>>>> timestamp
>>>>>>>>>>>>>>>>>>>>>>>>> * during rewrite (like compaction), it should be
>>>>>>>>>>>>>>>>>>>>>>>>> persisted into data files.
>>>>>>>>>>>>>>>>>>>>>>>>> * during update, it needs to be carried over from
>>>>>>>>>>>>>>>>>>>>>>>>> the previous row. This is similar to the row_id carry 
>>>>>>>>>>>>>>>>>>>>>>>>> over for row updates.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Sorry for the short hand.  These are not the same:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> 1.  Insertion time - time the row was inserted.
>>>>>>>>>>>>>>>>>>>>>>>> 2.  Create by - The system that created the record.
>>>>>>>>>>>>>>>>>>>>>>>> 3.  Updated by - The system that last updated the
>>>>>>>>>>>>>>>>>>>>>>>> record.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Depending on the exact use-case these might or
>>>>>>>>>>>>>>>>>>>>>>>> might not have utility.  I'm just wondering if there 
>>>>>>>>>>>>>>>>>>>>>>>> will be more example
>>>>>>>>>>>>>>>>>>>>>>>> like this in the future.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> created_by column would incur likely significantly
>>>>>>>>>>>>>>>>>>>>>>>>> higher storage overhead compared to the updated_by 
>>>>>>>>>>>>>>>>>>>>>>>>> column. As rows are
>>>>>>>>>>>>>>>>>>>>>>>>> updated overtime, the cardinality for this column in 
>>>>>>>>>>>>>>>>>>>>>>>>> data files can be
>>>>>>>>>>>>>>>>>>>>>>>>> high. Hence, the created_by column may not compress 
>>>>>>>>>>>>>>>>>>>>>>>>> well. This is a similar
>>>>>>>>>>>>>>>>>>>>>>>>> problem for the row_id column. One side effect of 
>>>>>>>>>>>>>>>>>>>>>>>>> enabling row lineage by
>>>>>>>>>>>>>>>>>>>>>>>>> default for V3 tables is the storage overhead of 
>>>>>>>>>>>>>>>>>>>>>>>>> row_id column after
>>>>>>>>>>>>>>>>>>>>>>>>> compaction especially for narrow tables with few 
>>>>>>>>>>>>>>>>>>>>>>>>> columns.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> I agree.  I think this analysis also shows that
>>>>>>>>>>>>>>>>>>>>>>>> some consumers of Iceberg might not necessarily want 
>>>>>>>>>>>>>>>>>>>>>>>> to have all these
>>>>>>>>>>>>>>>>>>>>>>>> columns, so we might want to make them configurable, 
>>>>>>>>>>>>>>>>>>>>>>>> rather than mandating
>>>>>>>>>>>>>>>>>>>>>>>> them for all tables. Ryan's thought on default values 
>>>>>>>>>>>>>>>>>>>>>>>> seems like it would
>>>>>>>>>>>>>>>>>>>>>>>> solve the issues I was raising.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>> Micah
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Dec 11, 2025 at 3:47 PM Ryan Blue <
>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> > An explicit timestamp column adds more burden to
>>>>>>>>>>>>>>>>>>>>>>>>> application developers. While some databases require 
>>>>>>>>>>>>>>>>>>>>>>>>> an explicit column in
>>>>>>>>>>>>>>>>>>>>>>>>> the schema, those databases provide triggers to auto 
>>>>>>>>>>>>>>>>>>>>>>>>> set the column value.
>>>>>>>>>>>>>>>>>>>>>>>>> For Iceberg, the snapshot timestamp is the closest to 
>>>>>>>>>>>>>>>>>>>>>>>>> the trigger timestamp.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Since the use cases don't require an exact
>>>>>>>>>>>>>>>>>>>>>>>>> timestamp, this seems like the best solution to get 
>>>>>>>>>>>>>>>>>>>>>>>>> what people want (an
>>>>>>>>>>>>>>>>>>>>>>>>> insertion timestamp) that has clear and well-defined 
>>>>>>>>>>>>>>>>>>>>>>>>> behavior. Since
>>>>>>>>>>>>>>>>>>>>>>>>> `current_timestamp` is defined by the SQL spec, it 
>>>>>>>>>>>>>>>>>>>>>>>>> makes sense to me that
>>>>>>>>>>>>>>>>>>>>>>>>> we could use it and have reasonable behavior.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> I've talked with Anton about this before and maybe
>>>>>>>>>>>>>>>>>>>>>>>>> he'll jump in on this thread. I think that we may 
>>>>>>>>>>>>>>>>>>>>>>>>> need to extend default
>>>>>>>>>>>>>>>>>>>>>>>>> values to include default value expressions, like 
>>>>>>>>>>>>>>>>>>>>>>>>> `current_timestamp` that
>>>>>>>>>>>>>>>>>>>>>>>>> is allowed by the SQL spec. That would solve the 
>>>>>>>>>>>>>>>>>>>>>>>>> problem as well as some
>>>>>>>>>>>>>>>>>>>>>>>>> others (like `current_date` or `current_user`) and 
>>>>>>>>>>>>>>>>>>>>>>>>> would not create a
>>>>>>>>>>>>>>>>>>>>>>>>> potentially misleading (and heavyweight) timestamp 
>>>>>>>>>>>>>>>>>>>>>>>>> feature in the format.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> > Also some environments may have stronger clock
>>>>>>>>>>>>>>>>>>>>>>>>> service, like Spanner TrueTime service.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Even in cases like this, commit retries can
>>>>>>>>>>>>>>>>>>>>>>>>> reorder commits and make timestamps out of order. I 
>>>>>>>>>>>>>>>>>>>>>>>>> don't think that we
>>>>>>>>>>>>>>>>>>>>>>>>> should be making guarantees or even exposing metadata 
>>>>>>>>>>>>>>>>>>>>>>>>> that people might
>>>>>>>>>>>>>>>>>>>>>>>>> mistake as having those guarantees.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Dec 9, 2025 at 2:22 PM Steven Wu <
>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Ryan, thanks a lot for the feedback!
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Regarding the concern for reliable timestamps, we
>>>>>>>>>>>>>>>>>>>>>>>>>> are not proposing using timestamps for ordering. 
>>>>>>>>>>>>>>>>>>>>>>>>>> With NTP in modern
>>>>>>>>>>>>>>>>>>>>>>>>>> computers, they are generally reliable enough for 
>>>>>>>>>>>>>>>>>>>>>>>>>> the intended use cases.
>>>>>>>>>>>>>>>>>>>>>>>>>> Also some environments may have stronger clock 
>>>>>>>>>>>>>>>>>>>>>>>>>> service, like Spanner
>>>>>>>>>>>>>>>>>>>>>>>>>> TrueTime service
>>>>>>>>>>>>>>>>>>>>>>>>>> <https://docs.cloud.google.com/spanner/docs/true-time-external-consistency>
>>>>>>>>>>>>>>>>>>>>>>>>>> .
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> >  joining to timestamps from the snapshots
>>>>>>>>>>>>>>>>>>>>>>>>>> metadata table.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> As you also mentioned, it depends on the snapshot
>>>>>>>>>>>>>>>>>>>>>>>>>> history, which is often retained for a few days due 
>>>>>>>>>>>>>>>>>>>>>>>>>> to performance reasons.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> > embedding a timestamp in DML (like
>>>>>>>>>>>>>>>>>>>>>>>>>> `current_timestamp`) rather than relying on an 
>>>>>>>>>>>>>>>>>>>>>>>>>> implicit one from table
>>>>>>>>>>>>>>>>>>>>>>>>>> metadata.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> An explicit timestamp column adds more burden to
>>>>>>>>>>>>>>>>>>>>>>>>>> application developers. While some databases require 
>>>>>>>>>>>>>>>>>>>>>>>>>> an explicit column in
>>>>>>>>>>>>>>>>>>>>>>>>>> the schema, those databases provide triggers to auto 
>>>>>>>>>>>>>>>>>>>>>>>>>> set the column value.
>>>>>>>>>>>>>>>>>>>>>>>>>> For Iceberg, the snapshot timestamp is the closest 
>>>>>>>>>>>>>>>>>>>>>>>>>> to the trigger timestamp.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Also, the timestamp set during computation (like
>>>>>>>>>>>>>>>>>>>>>>>>>> streaming ingestion or relative long batch 
>>>>>>>>>>>>>>>>>>>>>>>>>> computation) doesn't capture the
>>>>>>>>>>>>>>>>>>>>>>>>>> time the rows/files are added to the Iceberg table 
>>>>>>>>>>>>>>>>>>>>>>>>>> in a batch fashion.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> > And for those use cases, you could also keep a
>>>>>>>>>>>>>>>>>>>>>>>>>> longer history of snapshot timestamps, like storing 
>>>>>>>>>>>>>>>>>>>>>>>>>> a catalog's event log
>>>>>>>>>>>>>>>>>>>>>>>>>> for long-term access to timestamp info
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> this is not really consumable by joining the
>>>>>>>>>>>>>>>>>>>>>>>>>> regular table query with catalog event log. I would 
>>>>>>>>>>>>>>>>>>>>>>>>>> also imagine catalog
>>>>>>>>>>>>>>>>>>>>>>>>>> event log is capped at shorter retention (maybe a 
>>>>>>>>>>>>>>>>>>>>>>>>>> few months) compared to
>>>>>>>>>>>>>>>>>>>>>>>>>> data retention (could be a few years).
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Dec 9, 2025 at 1:32 PM Ryan Blue <
>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> I don't think it is a good idea to expose
>>>>>>>>>>>>>>>>>>>>>>>>>>> timestamps at the row level. Timestamps in metadata 
>>>>>>>>>>>>>>>>>>>>>>>>>>> that would be carried
>>>>>>>>>>>>>>>>>>>>>>>>>>> down to the row level already confuse people that 
>>>>>>>>>>>>>>>>>>>>>>>>>>> expect them to be useful
>>>>>>>>>>>>>>>>>>>>>>>>>>> or reliable, rather than for debugging. I think 
>>>>>>>>>>>>>>>>>>>>>>>>>>> extending this to the row
>>>>>>>>>>>>>>>>>>>>>>>>>>> level would only make the problem worse.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> You can already get this information by
>>>>>>>>>>>>>>>>>>>>>>>>>>> projecting the last updated sequence number, which 
>>>>>>>>>>>>>>>>>>>>>>>>>>> is reliable, and joining
>>>>>>>>>>>>>>>>>>>>>>>>>>> to timestamps from the snapshots metadata table. Of 
>>>>>>>>>>>>>>>>>>>>>>>>>>> course, the drawback
>>>>>>>>>>>>>>>>>>>>>>>>>>> there is losing the timestamp information when 
>>>>>>>>>>>>>>>>>>>>>>>>>>> snapshots expire, but since
>>>>>>>>>>>>>>>>>>>>>>>>>>> it isn't reliable anyway I'd be fine with that.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Some of the use cases, like auditing and
>>>>>>>>>>>>>>>>>>>>>>>>>>> compliance, are probably better served by embedding 
>>>>>>>>>>>>>>>>>>>>>>>>>>> a timestamp in DML
>>>>>>>>>>>>>>>>>>>>>>>>>>> (like `current_timestamp`) rather than relying on 
>>>>>>>>>>>>>>>>>>>>>>>>>>> an implicit one from
>>>>>>>>>>>>>>>>>>>>>>>>>>> table metadata. And for those use cases, you could 
>>>>>>>>>>>>>>>>>>>>>>>>>>> also keep a longer
>>>>>>>>>>>>>>>>>>>>>>>>>>> history of snapshot timestamps, like storing a 
>>>>>>>>>>>>>>>>>>>>>>>>>>> catalog's event log for
>>>>>>>>>>>>>>>>>>>>>>>>>>> long-term access to timestamp info. I think that 
>>>>>>>>>>>>>>>>>>>>>>>>>>> would be better than
>>>>>>>>>>>>>>>>>>>>>>>>>>> storing it at the row level.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Dec 8, 2025 at 3:46 PM Steven Wu <
>>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> For V4 spec, I have a small proposal [1] to
>>>>>>>>>>>>>>>>>>>>>>>>>>>> expose the row timestamp concept that can help 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> with many use cases like
>>>>>>>>>>>>>>>>>>>>>>>>>>>> temporal queries, latency tracking, TTL, auditing 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> and compliance.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> This *_last_updated_timestamp_ms * metadata
>>>>>>>>>>>>>>>>>>>>>>>>>>>> column behaves very similarly to the
>>>>>>>>>>>>>>>>>>>>>>>>>>>> *_last_updated_sequence_number* for row
>>>>>>>>>>>>>>>>>>>>>>>>>>>> lineage.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>    - Initially, it inherits from the snapshot
>>>>>>>>>>>>>>>>>>>>>>>>>>>>    timestamp.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>    - During rewrite (like compaction), its
>>>>>>>>>>>>>>>>>>>>>>>>>>>>    values are persisted in the data files.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Would love to hear what you think.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Steven
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>>>>>>>>>>>>>> https://docs.google.com/document/d/1cXr_RwEO6o66S8vR7k3NM8-bJ9tH2rkh4vSdMXNC8J8/edit?usp=sharing
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>

Re: [DISCUSS] row timestamp proposal

Reply via email to