Re: Discuss proposal - IRC APIs for Multi-Statement Multi-Table Transactions

Dov Alperin Sat, 08 Nov 2025 18:53:59 -0800

Hi Iceberg community!
(I initially opened this message as it's own thread in error, sorry about that)
I’m curious where this proposal landed? I work at Materialize
<http://materialize.com/> and we are keenly interested both in seeing this
proposal come to fruition but possibly also helping to implement it.


I see there was a call in May, but I’m not sure what the conclusion was. As
spec v4 nears closer, I am curious which of the two proposals the community
favors here?

Best,
Dov

On Tue, May 27, 2025 at 01:09:05AM -0700, Maninderjit Singh wrote:
> Forgot to attach a link to the update proposal
> <https://docs.google.com/document/d/1KVgUJc1WgftHfLz118vMbEE7HV8_pUDk4s-GJFDyAOE/edit?pli=1&tab=t.0#heading=h.ypbwvr181qn4>
> .
> 
> On Tue, May 27, 2025 at 1:06 AM Maninderjit Singh <
> [email protected]> wrote:
> 
> > Hi community,
> >
> >  I have updated the proposal with both the options (overwriting existing
> > timestamps-ms vs introducing a new sequence/timestamp field) as we have
> > initial consensus on using catalog authored sequence/timestamp. Jagdeep,
> > please review to ensure that the options are correctly captured. I have
> > also added additional arguments on why we can't assume timestamp to be
> > "informational" since it's being used in critical paths and
> > incorrect values can take the table offline.
> >
> > Also, I'm moving the meeting to Thursday to better accommodate conflicts.
> > I would also record the meeting in case anyone misses and is interested in
> > the discussion.
> >
> > Sync for iceberg multi-table transactions
> > Thursday, May 29 · 9:00 – 10:00am
> > Time zone: America/Los_Angeles
> > Google Meet joining info
> > Video call link: https://meet.google.com/ffc-ttjs-vti
> >
> > Thanks,
> > Maninder
> >
> >
> >
> > On Mon, May 26, 2025 at 12:47 AM Péter Váry <[email protected]>
> > wrote:
> >
> >> I'm interested, but can't be there, but please record the meeting.
> >> Thanks,
> >> Peter
> >>
> >> Maninderjit Singh <[email protected]> ezt írta (időpont:
> >> 2025. máj. 24., Szo, 2:30):
> >>
> >>> Hi dev community,
> >>> I was wondering if we could join a call next week for discussing the
> >>> multi-table transactions so we can make progress. I have shared a meeting
> >>> invite where anyone who's interested in the discussion can join. Please 
> >>> let
> >>> me know if this works.
> >>>
> >>> Thanks,
> >>> Maninder
> >>>
> >>> Sync for iceberg multi-table transactions
> >>> Friday, May 30 · 9:00 – 10:00am
> >>> Time zone: America/Los_Angeles
> >>> Google Meet joining info
> >>> Video call link: https://meet.google.com/ffc-ttjs-vti
> >>>
> >>>
> >>> On Wed, May 21, 2025 at 10:25 AM Maninderjit Singh <
> >>> [email protected]> wrote:
> >>>
> >>>> Hi dev community,
> >>>> Following up on the thread here to continue the discussion and get
> >>>> feedback since we couldn't get to it in sync. I think we have made some
> >>>> progress in the discussion that I want to capture while highlighting the
> >>>> items where we need to create consensus along with pros and cons. I would
> >>>> need help to add clarity and to make sure the arguments are captured
> >>>> correctly.
> >>>>
> >>>> *Things we agree on*
> >>>>
> >>>>    1. Don't maintain server side state for tracking the transactions.
> >>>>    2. Need global (catalog-wide) ordering of snapshots via some
> >>>>    (hybrid/logical) clock/CSN
> >>>>    3. Optionally expose the catalog's clock/CSN information without
> >>>>    changing how tables load
> >>>>    4. Loading consistent snapshot across multiple tables and
> >>>>    repeatable reads based on the reference clock/CSN
> >>>>
> >>>>
> >>>> *Things we disagree on*
> >>>>
> >>>>    1. Reuse existing timestamp field vs introduce a new field CSN
> >>>>
> >>>>
> >>>> *Reusing timestamp field approach*
> >>>>
> >>>>    - Pros:
> >>>>
> >>>>
> >>>>    1. Backwards compatibility, no change to table metadata spec so
> >>>>    could be used by existing v2 tables.
> >>>>    2. Fixes existing time travel and ordering issues
> >>>>    3. Simplifies and clarifies the spec (no new id for snapshots)
> >>>>    4. Common notion of timestamp that could be used to evaluate causal
> >>>>    relationships in other proposals like events or commit reports.
> >>>>
> >>>>
> >>>>    - Cons
> >>>>
> >>>>
> >>>>    1. Unique timestamp generation in milliseconds. Potential
> >>>>    mitigations:
> >>>>    
> >>>> https://docs.google.com/document/d/1KVgUJc1WgftHfLz118vMbEE7HV8_pUDk4s-GJFDyAOE/edit?pli=1&disco=AAABjwaxXeg
> >>>>    2. Concerns about client side timestamp being overridden.
> >>>>
> >>>> *Adding new CSN field*
> >>>>
> >>>>    - Pros:
> >>>>
> >>>>
> >>>>    1. Flexibility to use logical or hybrid clocks. Not sure how
> >>>>    clients can generate a hybrid clock timestamp here without suffering 
> >>>> from
> >>>>    clock skew (Would be good to clarify this)?
> >>>>    2. No client side overriding concerns.
> >>>>
> >>>>
> >>>>    - Cons:
> >>>>
> >>>>
> >>>>    1. Not backwards compatible, requires new field in table metadata
> >>>>    so need to wait for v4
> >>>>    2. Does not fix time travel and snapshot-log ordering issues
> >>>>    3. Adds another id for snapshots that clients need to generate and
> >>>>    reason about.
> >>>>    4. Could not be extended to use in other proposals for causal
> >>>>    reasoning.
> >>>>
> >>>>
> >>>> Thanks,
> >>>> Maninder
> >>>>
> >>>> On Tue, May 20, 2025 at 8:16 PM Maninderjit Singh <
> >>>> [email protected]> wrote:
> >>>>
> >>>>> Appreciate the feedback on the "catalog-authored timestamp" document
> >>>>> <https://docs.google.com/document/d/1KVgUJc1WgftHfLz118vMbEE7HV8_pUDk4s-GJFDyAOE/edit?pli=1&tab=t.0>
> >>>>> !
> >>>>>
> >>>>> Ryan, I don't think we can get consistent time travel queries in
> >>>>> iceberg without fixing the timestamp field since it's what the spec
> >>>>> <https://iceberg.apache.org/spec/#point-in-time-reads-time-travel>
> >>>>> prescribes for time travel. Hence I took the liberty to re-use it for 
> >>>>> the
> >>>>> catalog timestamp which ensures that snapshot-log is correctly ordered 
> >>>>> for
> >>>>> time travel.  Additionally, the timestamp field needs to be fixed to 
> >>>>> avoid
> >>>>> breaking commits to the table due to accidental large skews as per 
> >>>>> current
> >>>>> spec, the scenario is described in detail here
> >>>>> <https://docs.google.com/document/d/1KVgUJc1WgftHfLz118vMbEE7HV8_pUDk4s-GJFDyAOE/edit?pli=1&tab=t.0#bookmark=id.6avx66vzo168>
> >>>>> .
> >>>>> The other benefit of reusing the timestamp field is spec simplicity
> >>>>> and clarity on timestamp generation responsibilities without requiring 
> >>>>> the
> >>>>> need to manage yet another identifier (in addition to sequence number,
> >>>>> snapshot id and timestamp) for snapshots.
> >>>>>
> >>>>> Jagdeep, your concerns about overriding the timestamp field are valid
> >>>>> but the reason I'm not too worried about it is because client can't 
> >>>>> assume
> >>>>> a commit is successful without their response being acknowledged by the
> >>>>> catalog which returns the CommitTableResponse
> >>>>> <https://github.com/apache/iceberg/blob/c2478968e65368c61799d8ca4b89506a61ca3e7c/open-api/rest-catalog-open-api.yaml#L3997>
> >>>>>  with
> >>>>> new metadata (that has catalog authored timestamps in the proposal). I'm
> >>>>> happy to work with you to put something common together and get the best
> >>>>> out of the proposals.
> >>>>>
> >>>>> Thanks,
> >>>>> Maninder
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Tue, May 20, 2025 at 5:48 PM Jagdeep Sidhu <[email protected]>
> >>>>> wrote:
> >>>>>
> >>>>>> Thank you Ryan, Maninder and the rest of the community for feedback
> >>>>>> and ideas!
> >>>>>> Drew and I will take another pass and remove the catalog
> >>>>>> co-ordination requirement for LoadTable API, and bring the proposal 
> >>>>>> closer
> >>>>>> to "catalog-authored timestamp" in the sense that clients can use CSN 
> >>>>>> to
> >>>>>> find the right snapshot, but still leave upto Catalog on what it want 
> >>>>>> to
> >>>>>> use for CSN (Hybrid clock timestamp or another monotonically increasing
> >>>>>> number).
> >>>>>>
> >>>>>> If more folks have feedback, please leave it in the doc or email
> >>>>>> list, so we can address it as well in the document update.
> >>>>>>
> >>>>>> Maninder, one reason we proposed a new field for CommitSequenceNumber
> >>>>>> instead of using an existing field is for backwards compatibility. 
> >>>>>> Catalogs
> >>>>>> can start optionally exposing the new field, and interested clients 
> >>>>>> can use
> >>>>>> the new field, but existing clients keep working as is. Existing and 
> >>>>>> new
> >>>>>> clients can also keep working as is against the same tables in the
> >>>>>> same Catalog. My one worry is that having Catalog override the 
> >>>>>> timestamp
> >>>>>> field for commits may break some existing clients? Today all Iceberg
> >>>>>> engines/clients do not expect the timestamp field in 
> >>>>>> metadata/snapshot-log
> >>>>>> to be overwritten by the Catalog.
> >>>>>>
> >>>>>> How do you feel about taking the best from each proposal?, i.e.
> >>>>>> monotonically increasing commit sequence numbers (some catalogs can use
> >>>>>> timestamps, some can use logical clock but we don't have to enforce it 
> >>>>>> -
> >>>>>> leave it up to Catalog), but keep client side logic for resolving the 
> >>>>>> right
> >>>>>> snapshot using sequence numbers instead of adding that functionality to
> >>>>>> Catalog. Let me know!
> >>>>>>
> >>>>>> Thank you!
> >>>>>> -Jagdeep
> >>>>>>
> >>>>>> On Tue, May 20, 2025 at 2:45 PM Ryan Blue <[email protected]> wrote:
> >>>>>>
> >>>>>>> Thanks for the proposals! There are things that I think are good
> >>>>>>> about both of them. I think that the catalog-authored timestamps 
> >>>>>>> proposal
> >>>>>>> misunderstands the purpose of the timestamp field, but does get right 
> >>>>>>> that
> >>>>>>> a monotonically increasing "time" field (really a sequence number) 
> >>>>>>> across
> >>>>>>> tables enables the coordination needed for snapshot isolated reads. I 
> >>>>>>> like
> >>>>>>> that the sequence number proposal leaves the meaning of the field to 
> >>>>>>> the
> >>>>>>> catalog for coordination, but it still proposes catalog coordination 
> >>>>>>> by
> >>>>>>> loading tables "at" some sequence number. Ideally, we would be able to
> >>>>>>> (optionally) expose this extra catalog information to clients and not 
> >>>>>>> need
> >>>>>>> to change how loading works.
> >>>>>>>
> >>>>>>> Ryan
> >>>>>>>
> >>>>>>> On Tue, May 20, 2025 at 9:45 AM Ryan Blue <[email protected]> wrote:
> >>>>>>>
> >>>>>>>> Hi everyone,
> >>>>>>>>
> >>>>>>>> To avoid passing copies of a file around for comments, I put the
> >>>>>>>> doc for commit sequence numbers into Google so we can comment on a 
> >>>>>>>> central
> >>>>>>>> copy:
> >>>>>>>> https://docs.google.com/document/d/1jr4Ah8oceOmo6fwxG_0II4vKDUHUKScb/edit?usp=sharing&ouid=100239850723655533404&rtpof=true&sd=true
> >>>>>>>>
> >>>>>>>> Ryan
> >>>>>>>>
> >>>>>>>> On Fri, May 16, 2025 at 2:51 AM Maninderjit Singh <
> >>>>>>>> [email protected]> wrote:
> >>>>>>>>
> >>>>>>>>> Thanks for the updated proposal Drew!
> >>>>>>>>> My preference for using the catalog authored timestamp is to
> >>>>>>>>> minimize changes to the REST spec so we can have good backwards
> >>>>>>>>> compatibility. I have quickly put together a draft proposal on how 
> >>>>>>>>> this
> >>>>>>>>> should work. Looking forward to feedback and discussion.
> >>>>>>>>>
> >>>>>>>>>  Draft Proposal: Catalog‑Authored Timestamps for Apache Iceberg
> >>>>>>>>> REST Catalog
> >>>>>>>>> <https://drive.google.com/open?id=1KVgUJc1WgftHfLz118vMbEE7HV8_pUDk4s-GJFDyAOE>
> >>>>>>>>>
> >>>>>>>>> Thanks,
> >>>>>>>>> Maninder
> >>>>>>>>>
> >>>>>>>>> On Wed, May 14, 2025 at 6:12 PM Drew <[email protected]> wrote:
> >>>>>>>>>
> >>>>>>>>>> Hi everyone,
> >>>>>>>>>>
> >>>>>>>>>> Thank you for feedback on the MTT proposal and during community
> >>>>>>>>>> sync. Based on it, Jagdeep and I have iterated on the document and 
> >>>>>>>>>> added a
> >>>>>>>>>> second option to use *Catalog CommitSequenceNumbers*. Looking
> >>>>>>>>>> forward to getting more feedback on the proposal, where to add 
> >>>>>>>>>> more details
> >>>>>>>>>> or approach/changes to consider. We appreciate everyone's time on 
> >>>>>>>>>> this!
> >>>>>>>>>>
> >>>>>>>>>> The option introduces *Catalog CommitSequenceNumbers(CSNs)*,
> >>>>>>>>>> which allow clients/engines to read a consistent view of multiple 
> >>>>>>>>>> tables
> >>>>>>>>>> without needing to register a transaction context with the 
> >>>>>>>>>> catalog. This
> >>>>>>>>>> removes the need of registering a transaction context with 
> >>>>>>>>>> Catalog, thus
> >>>>>>>>>> removing the need of transaction bookkeeping on the catalog side. 
> >>>>>>>>>> For
> >>>>>>>>>> aborting transactions early, clients can use LoadTable with and 
> >>>>>>>>>> without CSN
> >>>>>>>>>> to figure out if there is already a conflicting write on any of 
> >>>>>>>>>> the tables
> >>>>>>>>>> being modified. Also removed the section where transactions were 
> >>>>>>>>>> staging
> >>>>>>>>>> commits on Catalog, and changed the proposal to align with 
> >>>>>>>>>> Eduard's PR
> >>>>>>>>>> around staging changes locally before commit (
> >>>>>>>>>> https://github.com/apache/iceberg/pull/6948).
> >>>>>>>>>>
> >>>>>>>>>> Jagdeep also clarified in an example in a previous email where a
> >>>>>>>>>> workload may require multi table snapshot isolation, even if the 
> >>>>>>>>>> tables are
> >>>>>>>>>> being updated without Multi-Table commit API. Though most MTT 
> >>>>>>>>>> transactions
> >>>>>>>>>> will commit using the multi table commit API.
> >>>>>>>>>>
> >>>>>>>>>> Maninder, for the approach of "common notion of time between
> >>>>>>>>>> clients and catalog" - I spent some time thinking about it, but 
> >>>>>>>>>> cannot find
> >>>>>>>>>> a feasible way to do this. Yes, the catalogs can use a high 
> >>>>>>>>>> precision
> >>>>>>>>>> clock, but clients cannot use Catalog Timestamp from API calls to 
> >>>>>>>>>> set local
> >>>>>>>>>> clock due to network latency for request/response. For example, 
> >>>>>>>>>> different
> >>>>>>>>>> requests to the same Catalog servers can return different 
> >>>>>>>>>> timestamps based
> >>>>>>>>>> on network latency. Also what if a client works with more than 1 
> >>>>>>>>>> Catalog.
> >>>>>>>>>> If you want to do a rough write-up or share a reference 
> >>>>>>>>>> implementation that
> >>>>>>>>>> uses such an approach, I will be happy to brainstorm it more. Let 
> >>>>>>>>>> us know!
> >>>>>>>>>>
> >>>>>>>>>> Here is the link to updated proposal
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> <https://docs.google.com/document/d/1jr4Ah8oceOmo6fwxG_0II4vKDUHUKScb/edit?usp=sharing&ouid=100384647237395649950&rtpof=true&sd=true>
> >>>>>>>>>> Thanks Again!
> >>>>>>>>>> - Drew
> >>>>>>>>>>
> >>>>>>>>>

Re: Discuss proposal - IRC APIs for Multi-Statement Multi-Table Transactions

Reply via email to