Forgot to attach a link to the update proposal
<https://docs.google.com/document/d/1KVgUJc1WgftHfLz118vMbEE7HV8_pUDk4s-GJFDyAOE/edit?pli=1&tab=t.0#heading=h.ypbwvr181qn4>
.

On Tue, May 27, 2025 at 1:06 AM Maninderjit Singh <
parmar.maninder...@gmail.com> wrote:

> Hi community,
>
>  I have updated the proposal with both the options (overwriting existing
> timestamps-ms vs introducing a new sequence/timestamp field) as we have
> initial consensus on using catalog authored sequence/timestamp. Jagdeep,
> please review to ensure that the options are correctly captured. I have
> also added additional arguments on why we can't assume timestamp to be
> "informational" since it's being used in critical paths and
> incorrect values can take the table offline.
>
> Also, I'm moving the meeting to Thursday to better accommodate conflicts.
> I would also record the meeting in case anyone misses and is interested in
> the discussion.
>
> Sync for iceberg multi-table transactions
> Thursday, May 29 · 9:00 – 10:00am
> Time zone: America/Los_Angeles
> Google Meet joining info
> Video call link: https://meet.google.com/ffc-ttjs-vti
>
> Thanks,
> Maninder
>
>
>
> On Mon, May 26, 2025 at 12:47 AM Péter Váry <peter.vary.apa...@gmail.com>
> wrote:
>
>> I'm interested, but can't be there, but please record the meeting.
>> Thanks,
>> Peter
>>
>> Maninderjit Singh <parmar.maninder...@gmail.com> ezt írta (időpont:
>> 2025. máj. 24., Szo, 2:30):
>>
>>> Hi dev community,
>>> I was wondering if we could join a call next week for discussing the
>>> multi-table transactions so we can make progress. I have shared a meeting
>>> invite where anyone who's interested in the discussion can join. Please let
>>> me know if this works.
>>>
>>> Thanks,
>>> Maninder
>>>
>>> Sync for iceberg multi-table transactions
>>> Friday, May 30 · 9:00 – 10:00am
>>> Time zone: America/Los_Angeles
>>> Google Meet joining info
>>> Video call link: https://meet.google.com/ffc-ttjs-vti
>>>
>>>
>>> On Wed, May 21, 2025 at 10:25 AM Maninderjit Singh <
>>> parmar.maninder...@gmail.com> wrote:
>>>
>>>> Hi dev community,
>>>> Following up on the thread here to continue the discussion and get
>>>> feedback since we couldn't get to it in sync. I think we have made some
>>>> progress in the discussion that I want to capture while highlighting the
>>>> items where we need to create consensus along with pros and cons. I would
>>>> need help to add clarity and to make sure the arguments are captured
>>>> correctly.
>>>>
>>>> *Things we agree on*
>>>>
>>>>    1. Don't maintain server side state for tracking the transactions.
>>>>    2. Need global (catalog-wide) ordering of snapshots via some
>>>>    (hybrid/logical) clock/CSN
>>>>    3. Optionally expose the catalog's clock/CSN information without
>>>>    changing how tables load
>>>>    4. Loading consistent snapshot across multiple tables and
>>>>    repeatable reads based on the reference clock/CSN
>>>>
>>>>
>>>> *Things we disagree on*
>>>>
>>>>    1. Reuse existing timestamp field vs introduce a new field CSN
>>>>
>>>>
>>>> *Reusing timestamp field approach*
>>>>
>>>>    - Pros:
>>>>
>>>>
>>>>    1. Backwards compatibility, no change to table metadata spec so
>>>>    could be used by existing v2 tables.
>>>>    2. Fixes existing time travel and ordering issues
>>>>    3. Simplifies and clarifies the spec (no new id for snapshots)
>>>>    4. Common notion of timestamp that could be used to evaluate causal
>>>>    relationships in other proposals like events or commit reports.
>>>>
>>>>
>>>>    - Cons
>>>>
>>>>
>>>>    1. Unique timestamp generation in milliseconds. Potential
>>>>    mitigations:
>>>>    
>>>> https://docs.google.com/document/d/1KVgUJc1WgftHfLz118vMbEE7HV8_pUDk4s-GJFDyAOE/edit?pli=1&disco=AAABjwaxXeg
>>>>    2. Concerns about client side timestamp being overridden.
>>>>
>>>> *Adding new CSN field*
>>>>
>>>>    - Pros:
>>>>
>>>>
>>>>    1. Flexibility to use logical or hybrid clocks. Not sure how
>>>>    clients can generate a hybrid clock timestamp here without suffering 
>>>> from
>>>>    clock skew (Would be good to clarify this)?
>>>>    2. No client side overriding concerns.
>>>>
>>>>
>>>>    - Cons:
>>>>
>>>>
>>>>    1. Not backwards compatible, requires new field in table metadata
>>>>    so need to wait for v4
>>>>    2. Does not fix time travel and snapshot-log ordering issues
>>>>    3. Adds another id for snapshots that clients need to generate and
>>>>    reason about.
>>>>    4. Could not be extended to use in other proposals for causal
>>>>    reasoning.
>>>>
>>>>
>>>> Thanks,
>>>> Maninder
>>>>
>>>> On Tue, May 20, 2025 at 8:16 PM Maninderjit Singh <
>>>> parmar.maninder...@gmail.com> wrote:
>>>>
>>>>> Appreciate the feedback on the "catalog-authored timestamp" document
>>>>> <https://docs.google.com/document/d/1KVgUJc1WgftHfLz118vMbEE7HV8_pUDk4s-GJFDyAOE/edit?pli=1&tab=t.0>
>>>>> !
>>>>>
>>>>> Ryan, I don't think we can get consistent time travel queries in
>>>>> iceberg without fixing the timestamp field since it's what the spec
>>>>> <https://iceberg.apache.org/spec/#point-in-time-reads-time-travel>
>>>>> prescribes for time travel. Hence I took the liberty to re-use it for the
>>>>> catalog timestamp which ensures that snapshot-log is correctly ordered for
>>>>> time travel.  Additionally, the timestamp field needs to be fixed to avoid
>>>>> breaking commits to the table due to accidental large skews as per current
>>>>> spec, the scenario is described in detail here
>>>>> <https://docs.google.com/document/d/1KVgUJc1WgftHfLz118vMbEE7HV8_pUDk4s-GJFDyAOE/edit?pli=1&tab=t.0#bookmark=id.6avx66vzo168>
>>>>> .
>>>>> The other benefit of reusing the timestamp field is spec simplicity
>>>>> and clarity on timestamp generation responsibilities without requiring the
>>>>> need to manage yet another identifier (in addition to sequence number,
>>>>> snapshot id and timestamp) for snapshots.
>>>>>
>>>>> Jagdeep, your concerns about overriding the timestamp field are valid
>>>>> but the reason I'm not too worried about it is because client can't assume
>>>>> a commit is successful without their response being acknowledged by the
>>>>> catalog which returns the CommitTableResponse
>>>>> <https://github.com/apache/iceberg/blob/c2478968e65368c61799d8ca4b89506a61ca3e7c/open-api/rest-catalog-open-api.yaml#L3997>
>>>>>  with
>>>>> new metadata (that has catalog authored timestamps in the proposal). I'm
>>>>> happy to work with you to put something common together and get the best
>>>>> out of the proposals.
>>>>>
>>>>> Thanks,
>>>>> Maninder
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Tue, May 20, 2025 at 5:48 PM Jagdeep Sidhu <sidhujagde...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Thank you Ryan, Maninder and the rest of the community for feedback
>>>>>> and ideas!
>>>>>> Drew and I will take another pass and remove the catalog
>>>>>> co-ordination requirement for LoadTable API, and bring the proposal 
>>>>>> closer
>>>>>> to "catalog-authored timestamp" in the sense that clients can use CSN to
>>>>>> find the right snapshot, but still leave upto Catalog on what it want to
>>>>>> use for CSN (Hybrid clock timestamp or another monotonically increasing
>>>>>> number).
>>>>>>
>>>>>> If more folks have feedback, please leave it in the doc or email
>>>>>> list, so we can address it as well in the document update.
>>>>>>
>>>>>> Maninder, one reason we proposed a new field for CommitSequenceNumber
>>>>>> instead of using an existing field is for backwards compatibility. 
>>>>>> Catalogs
>>>>>> can start optionally exposing the new field, and interested clients can 
>>>>>> use
>>>>>> the new field, but existing clients keep working as is. Existing and new
>>>>>> clients can also keep working as is against the same tables in the
>>>>>> same Catalog. My one worry is that having Catalog override the timestamp
>>>>>> field for commits may break some existing clients? Today all Iceberg
>>>>>> engines/clients do not expect the timestamp field in 
>>>>>> metadata/snapshot-log
>>>>>> to be overwritten by the Catalog.
>>>>>>
>>>>>> How do you feel about taking the best from each proposal?, i.e.
>>>>>> monotonically increasing commit sequence numbers (some catalogs can use
>>>>>> timestamps, some can use logical clock but we don't have to enforce it -
>>>>>> leave it up to Catalog), but keep client side logic for resolving the 
>>>>>> right
>>>>>> snapshot using sequence numbers instead of adding that functionality to
>>>>>> Catalog. Let me know!
>>>>>>
>>>>>> Thank you!
>>>>>> -Jagdeep
>>>>>>
>>>>>> On Tue, May 20, 2025 at 2:45 PM Ryan Blue <rdb...@gmail.com> wrote:
>>>>>>
>>>>>>> Thanks for the proposals! There are things that I think are good
>>>>>>> about both of them. I think that the catalog-authored timestamps 
>>>>>>> proposal
>>>>>>> misunderstands the purpose of the timestamp field, but does get right 
>>>>>>> that
>>>>>>> a monotonically increasing "time" field (really a sequence number) 
>>>>>>> across
>>>>>>> tables enables the coordination needed for snapshot isolated reads. I 
>>>>>>> like
>>>>>>> that the sequence number proposal leaves the meaning of the field to the
>>>>>>> catalog for coordination, but it still proposes catalog coordination by
>>>>>>> loading tables "at" some sequence number. Ideally, we would be able to
>>>>>>> (optionally) expose this extra catalog information to clients and not 
>>>>>>> need
>>>>>>> to change how loading works.
>>>>>>>
>>>>>>> Ryan
>>>>>>>
>>>>>>> On Tue, May 20, 2025 at 9:45 AM Ryan Blue <rdb...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi everyone,
>>>>>>>>
>>>>>>>> To avoid passing copies of a file around for comments, I put the
>>>>>>>> doc for commit sequence numbers into Google so we can comment on a 
>>>>>>>> central
>>>>>>>> copy:
>>>>>>>> https://docs.google.com/document/d/1jr4Ah8oceOmo6fwxG_0II4vKDUHUKScb/edit?usp=sharing&ouid=100239850723655533404&rtpof=true&sd=true
>>>>>>>>
>>>>>>>> Ryan
>>>>>>>>
>>>>>>>> On Fri, May 16, 2025 at 2:51 AM Maninderjit Singh <
>>>>>>>> parmar.maninder...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Thanks for the updated proposal Drew!
>>>>>>>>> My preference for using the catalog authored timestamp is to
>>>>>>>>> minimize changes to the REST spec so we can have good backwards
>>>>>>>>> compatibility. I have quickly put together a draft proposal on how 
>>>>>>>>> this
>>>>>>>>> should work. Looking forward to feedback and discussion.
>>>>>>>>>
>>>>>>>>>  Draft Proposal: Catalog‑Authored Timestamps for Apache Iceberg
>>>>>>>>> REST Catalog
>>>>>>>>> <https://drive.google.com/open?id=1KVgUJc1WgftHfLz118vMbEE7HV8_pUDk4s-GJFDyAOE>
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Maninder
>>>>>>>>>
>>>>>>>>> On Wed, May 14, 2025 at 6:12 PM Drew <img...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi everyone,
>>>>>>>>>>
>>>>>>>>>> Thank you for feedback on the MTT proposal and during community
>>>>>>>>>> sync. Based on it, Jagdeep and I have iterated on the document and 
>>>>>>>>>> added a
>>>>>>>>>> second option to use *Catalog CommitSequenceNumbers*. Looking
>>>>>>>>>> forward to getting more feedback on the proposal, where to add more 
>>>>>>>>>> details
>>>>>>>>>> or approach/changes to consider. We appreciate everyone's time on 
>>>>>>>>>> this!
>>>>>>>>>>
>>>>>>>>>> The option introduces *Catalog CommitSequenceNumbers(CSNs)*,
>>>>>>>>>> which allow clients/engines to read a consistent view of multiple 
>>>>>>>>>> tables
>>>>>>>>>> without needing to register a transaction context with the catalog. 
>>>>>>>>>> This
>>>>>>>>>> removes the need of registering a transaction context with Catalog, 
>>>>>>>>>> thus
>>>>>>>>>> removing the need of transaction bookkeeping on the catalog side. For
>>>>>>>>>> aborting transactions early, clients can use LoadTable with and 
>>>>>>>>>> without CSN
>>>>>>>>>> to figure out if there is already a conflicting write on any of the 
>>>>>>>>>> tables
>>>>>>>>>> being modified. Also removed the section where transactions were 
>>>>>>>>>> staging
>>>>>>>>>> commits on Catalog, and changed the proposal to align with Eduard's 
>>>>>>>>>> PR
>>>>>>>>>> around staging changes locally before commit (
>>>>>>>>>> https://github.com/apache/iceberg/pull/6948).
>>>>>>>>>>
>>>>>>>>>> Jagdeep also clarified in an example in a previous email where a
>>>>>>>>>> workload may require multi table snapshot isolation, even if the 
>>>>>>>>>> tables are
>>>>>>>>>> being updated without Multi-Table commit API. Though most MTT 
>>>>>>>>>> transactions
>>>>>>>>>> will commit using the multi table commit API.
>>>>>>>>>>
>>>>>>>>>> Maninder, for the approach of "common notion of time between
>>>>>>>>>> clients and catalog" - I spent some time thinking about it, but 
>>>>>>>>>> cannot find
>>>>>>>>>> a feasible way to do this. Yes, the catalogs can use a high precision
>>>>>>>>>> clock, but clients cannot use Catalog Timestamp from API calls to 
>>>>>>>>>> set local
>>>>>>>>>> clock due to network latency for request/response. For example, 
>>>>>>>>>> different
>>>>>>>>>> requests to the same Catalog servers can return different timestamps 
>>>>>>>>>> based
>>>>>>>>>> on network latency. Also what if a client works with more than 1 
>>>>>>>>>> Catalog.
>>>>>>>>>> If you want to do a rough write-up or share a reference 
>>>>>>>>>> implementation that
>>>>>>>>>> uses such an approach, I will be happy to brainstorm it more. Let us 
>>>>>>>>>> know!
>>>>>>>>>>
>>>>>>>>>> Here is the link to updated proposal
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> <https://docs.google.com/document/d/1jr4Ah8oceOmo6fwxG_0II4vKDUHUKScb/edit?usp=sharing&ouid=100384647237395649950&rtpof=true&sd=true>
>>>>>>>>>> Thanks Again!
>>>>>>>>>> - Drew
>>>>>>>>>>
>>>>>>>>>

Reply via email to