Re: Discuss proposal - IRC APIs for Multi-Statement Multi-Table Transactions

Péter Váry Mon, 26 May 2025 01:36:16 -0700

I'm interested, but can't be there, but please record the meeting.
Thanks,
Peter


Maninderjit Singh <parmar.maninder...@gmail.com> ezt írta (időpont: 2025.
máj. 24., Szo, 2:30):

> Hi dev community,
> I was wondering if we could join a call next week for discussing the
> multi-table transactions so we can make progress. I have shared a meeting
> invite where anyone who's interested in the discussion can join. Please let
> me know if this works.
>
> Thanks,
> Maninder
>
> Sync for iceberg multi-table transactions
> Friday, May 30 · 9:00 – 10:00am
> Time zone: America/Los_Angeles
> Google Meet joining info
> Video call link: https://meet.google.com/ffc-ttjs-vti
>
>
> On Wed, May 21, 2025 at 10:25 AM Maninderjit Singh <
> parmar.maninder...@gmail.com> wrote:
>
>> Hi dev community,
>> Following up on the thread here to continue the discussion and get
>> feedback since we couldn't get to it in sync. I think we have made some
>> progress in the discussion that I want to capture while highlighting the
>> items where we need to create consensus along with pros and cons. I would
>> need help to add clarity and to make sure the arguments are captured
>> correctly.
>>
>> *Things we agree on*
>>
>>    1. Don't maintain server side state for tracking the transactions.
>>    2. Need global (catalog-wide) ordering of snapshots via some
>>    (hybrid/logical) clock/CSN
>>    3. Optionally expose the catalog's clock/CSN information without
>>    changing how tables load
>>    4. Loading consistent snapshot across multiple tables and repeatable
>>    reads based on the reference clock/CSN
>>
>>
>> *Things we disagree on*
>>
>>    1. Reuse existing timestamp field vs introduce a new field CSN
>>
>>
>> *Reusing timestamp field approach*
>>
>>    - Pros:
>>
>>
>>    1. Backwards compatibility, no change to table metadata spec so could
>>    be used by existing v2 tables.
>>    2. Fixes existing time travel and ordering issues
>>    3. Simplifies and clarifies the spec (no new id for snapshots)
>>    4. Common notion of timestamp that could be used to evaluate causal
>>    relationships in other proposals like events or commit reports.
>>
>>
>>    - Cons
>>
>>
>>    1. Unique timestamp generation in milliseconds. Potential
>>    mitigations:
>>    
>> https://docs.google.com/document/d/1KVgUJc1WgftHfLz118vMbEE7HV8_pUDk4s-GJFDyAOE/edit?pli=1&disco=AAABjwaxXeg
>>    2. Concerns about client side timestamp being overridden.
>>
>> *Adding new CSN field*
>>
>>    - Pros:
>>
>>
>>    1. Flexibility to use logical or hybrid clocks. Not sure how clients
>>    can generate a hybrid clock timestamp here without suffering from clock
>>    skew (Would be good to clarify this)?
>>    2. No client side overriding concerns.
>>
>>
>>    - Cons:
>>
>>
>>    1. Not backwards compatible, requires new field in table metadata so
>>    need to wait for v4
>>    2. Does not fix time travel and snapshot-log ordering issues
>>    3. Adds another id for snapshots that clients need to generate and
>>    reason about.
>>    4. Could not be extended to use in other proposals for causal
>>    reasoning.
>>
>>
>> Thanks,
>> Maninder
>>
>> On Tue, May 20, 2025 at 8:16 PM Maninderjit Singh <
>> parmar.maninder...@gmail.com> wrote:
>>
>>> Appreciate the feedback on the "catalog-authored timestamp" document
>>> <https://docs.google.com/document/d/1KVgUJc1WgftHfLz118vMbEE7HV8_pUDk4s-GJFDyAOE/edit?pli=1&tab=t.0>
>>> !
>>>
>>> Ryan, I don't think we can get consistent time travel queries in iceberg
>>> without fixing the timestamp field since it's what the spec
>>> <https://iceberg.apache.org/spec/#point-in-time-reads-time-travel>
>>> prescribes for time travel. Hence I took the liberty to re-use it for the
>>> catalog timestamp which ensures that snapshot-log is correctly ordered for
>>> time travel.  Additionally, the timestamp field needs to be fixed to avoid
>>> breaking commits to the table due to accidental large skews as per current
>>> spec, the scenario is described in detail here
>>> <https://docs.google.com/document/d/1KVgUJc1WgftHfLz118vMbEE7HV8_pUDk4s-GJFDyAOE/edit?pli=1&tab=t.0#bookmark=id.6avx66vzo168>
>>> .
>>> The other benefit of reusing the timestamp field is spec simplicity and
>>> clarity on timestamp generation responsibilities without requiring the need
>>> to manage yet another identifier (in addition to sequence number, snapshot
>>> id and timestamp) for snapshots.
>>>
>>> Jagdeep, your concerns about overriding the timestamp field are valid
>>> but the reason I'm not too worried about it is because client can't assume
>>> a commit is successful without their response being acknowledged by the
>>> catalog which returns the CommitTableResponse
>>> <https://github.com/apache/iceberg/blob/c2478968e65368c61799d8ca4b89506a61ca3e7c/open-api/rest-catalog-open-api.yaml#L3997>
>>>  with
>>> new metadata (that has catalog authored timestamps in the proposal). I'm
>>> happy to work with you to put something common together and get the best
>>> out of the proposals.
>>>
>>> Thanks,
>>> Maninder
>>>
>>>
>>>
>>>
>>> On Tue, May 20, 2025 at 5:48 PM Jagdeep Sidhu <sidhujagde...@gmail.com>
>>> wrote:
>>>
>>>> Thank you Ryan, Maninder and the rest of the community for feedback and
>>>> ideas!
>>>> Drew and I will take another pass and remove the catalog co-ordination
>>>> requirement for LoadTable API, and bring the proposal closer to
>>>> "catalog-authored timestamp" in the sense that clients can use CSN to find
>>>> the right snapshot, but still leave upto Catalog on what it want to use for
>>>> CSN (Hybrid clock timestamp or another monotonically increasing number).
>>>>
>>>> If more folks have feedback, please leave it in the doc or email list,
>>>> so we can address it as well in the document update.
>>>>
>>>> Maninder, one reason we proposed a new field for CommitSequenceNumber
>>>> instead of using an existing field is for backwards compatibility. Catalogs
>>>> can start optionally exposing the new field, and interested clients can use
>>>> the new field, but existing clients keep working as is. Existing and new
>>>> clients can also keep working as is against the same tables in the
>>>> same Catalog. My one worry is that having Catalog override the timestamp
>>>> field for commits may break some existing clients? Today all Iceberg
>>>> engines/clients do not expect the timestamp field in metadata/snapshot-log
>>>> to be overwritten by the Catalog.
>>>>
>>>> How do you feel about taking the best from each proposal?, i.e.
>>>> monotonically increasing commit sequence numbers (some catalogs can use
>>>> timestamps, some can use logical clock but we don't have to enforce it -
>>>> leave it up to Catalog), but keep client side logic for resolving the right
>>>> snapshot using sequence numbers instead of adding that functionality to
>>>> Catalog. Let me know!
>>>>
>>>> Thank you!
>>>> -Jagdeep
>>>>
>>>> On Tue, May 20, 2025 at 2:45 PM Ryan Blue <rdb...@gmail.com> wrote:
>>>>
>>>>> Thanks for the proposals! There are things that I think are good about
>>>>> both of them. I think that the catalog-authored timestamps proposal
>>>>> misunderstands the purpose of the timestamp field, but does get right that
>>>>> a monotonically increasing "time" field (really a sequence number) across
>>>>> tables enables the coordination needed for snapshot isolated reads. I like
>>>>> that the sequence number proposal leaves the meaning of the field to the
>>>>> catalog for coordination, but it still proposes catalog coordination by
>>>>> loading tables "at" some sequence number. Ideally, we would be able to
>>>>> (optionally) expose this extra catalog information to clients and not need
>>>>> to change how loading works.
>>>>>
>>>>> Ryan
>>>>>
>>>>> On Tue, May 20, 2025 at 9:45 AM Ryan Blue <rdb...@gmail.com> wrote:
>>>>>
>>>>>> Hi everyone,
>>>>>>
>>>>>> To avoid passing copies of a file around for comments, I put the doc
>>>>>> for commit sequence numbers into Google so we can comment on a central
>>>>>> copy:
>>>>>> https://docs.google.com/document/d/1jr4Ah8oceOmo6fwxG_0II4vKDUHUKScb/edit?usp=sharing&ouid=100239850723655533404&rtpof=true&sd=true
>>>>>>
>>>>>> Ryan
>>>>>>
>>>>>> On Fri, May 16, 2025 at 2:51 AM Maninderjit Singh <
>>>>>> parmar.maninder...@gmail.com> wrote:
>>>>>>
>>>>>>> Thanks for the updated proposal Drew!
>>>>>>> My preference for using the catalog authored timestamp is to
>>>>>>> minimize changes to the REST spec so we can have good backwards
>>>>>>> compatibility. I have quickly put together a draft proposal on how this
>>>>>>> should work. Looking forward to feedback and discussion.
>>>>>>>
>>>>>>>  Draft Proposal: Catalog‑Authored Timestamps for Apache Iceberg
>>>>>>> REST Catalog
>>>>>>> <https://drive.google.com/open?id=1KVgUJc1WgftHfLz118vMbEE7HV8_pUDk4s-GJFDyAOE>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Maninder
>>>>>>>
>>>>>>> On Wed, May 14, 2025 at 6:12 PM Drew <img...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi everyone,
>>>>>>>>
>>>>>>>> Thank you for feedback on the MTT proposal and during community
>>>>>>>> sync. Based on it, Jagdeep and I have iterated on the document and 
>>>>>>>> added a
>>>>>>>> second option to use *Catalog CommitSequenceNumbers*. Looking
>>>>>>>> forward to getting more feedback on the proposal, where to add more 
>>>>>>>> details
>>>>>>>> or approach/changes to consider. We appreciate everyone's time on this!
>>>>>>>>
>>>>>>>> The option introduces *Catalog CommitSequenceNumbers(CSNs)*, which
>>>>>>>> allow clients/engines to read a consistent view of multiple tables 
>>>>>>>> without
>>>>>>>> needing to register a transaction context with the catalog. This 
>>>>>>>> removes
>>>>>>>> the need of registering a transaction context with Catalog, thus 
>>>>>>>> removing
>>>>>>>> the need of transaction bookkeeping on the catalog side. For aborting
>>>>>>>> transactions early, clients can use LoadTable with and without CSN to
>>>>>>>> figure out if there is already a conflicting write on any of the tables
>>>>>>>> being modified. Also removed the section where transactions were 
>>>>>>>> staging
>>>>>>>> commits on Catalog, and changed the proposal to align with Eduard's PR
>>>>>>>> around staging changes locally before commit (
>>>>>>>> https://github.com/apache/iceberg/pull/6948).
>>>>>>>>
>>>>>>>> Jagdeep also clarified in an example in a previous email where a
>>>>>>>> workload may require multi table snapshot isolation, even if the 
>>>>>>>> tables are
>>>>>>>> being updated without Multi-Table commit API. Though most MTT 
>>>>>>>> transactions
>>>>>>>> will commit using the multi table commit API.
>>>>>>>>
>>>>>>>> Maninder, for the approach of "common notion of time between
>>>>>>>> clients and catalog" - I spent some time thinking about it, but cannot 
>>>>>>>> find
>>>>>>>> a feasible way to do this. Yes, the catalogs can use a high precision
>>>>>>>> clock, but clients cannot use Catalog Timestamp from API calls to set 
>>>>>>>> local
>>>>>>>> clock due to network latency for request/response. For example, 
>>>>>>>> different
>>>>>>>> requests to the same Catalog servers can return different timestamps 
>>>>>>>> based
>>>>>>>> on network latency. Also what if a client works with more than 1 
>>>>>>>> Catalog.
>>>>>>>> If you want to do a rough write-up or share a reference implementation 
>>>>>>>> that
>>>>>>>> uses such an approach, I will be happy to brainstorm it more. Let us 
>>>>>>>> know!
>>>>>>>>
>>>>>>>> Here is the link to updated proposal
>>>>>>>>
>>>>>>>>
>>>>>>>> <https://docs.google.com/document/d/1jr4Ah8oceOmo6fwxG_0II4vKDUHUKScb/edit?usp=sharing&ouid=100384647237395649950&rtpof=true&sd=true>
>>>>>>>> Thanks Again!
>>>>>>>> - Drew
>>>>>>>>
>>>>>>>

Re: Discuss proposal - IRC APIs for Multi-Statement Multi-Table Transactions

Reply via email to