Forgot to attach a link to the update proposal <https://docs.google.com/document/d/1KVgUJc1WgftHfLz118vMbEE7HV8_pUDk4s-GJFDyAOE/edit?pli=1&tab=t.0#heading=h.ypbwvr181qn4> .
On Tue, May 27, 2025 at 1:06 AM Maninderjit Singh < parmar.maninder...@gmail.com> wrote: > Hi community, > > I have updated the proposal with both the options (overwriting existing > timestamps-ms vs introducing a new sequence/timestamp field) as we have > initial consensus on using catalog authored sequence/timestamp. Jagdeep, > please review to ensure that the options are correctly captured. I have > also added additional arguments on why we can't assume timestamp to be > "informational" since it's being used in critical paths and > incorrect values can take the table offline. > > Also, I'm moving the meeting to Thursday to better accommodate conflicts. > I would also record the meeting in case anyone misses and is interested in > the discussion. > > Sync for iceberg multi-table transactions > Thursday, May 29 · 9:00 – 10:00am > Time zone: America/Los_Angeles > Google Meet joining info > Video call link: https://meet.google.com/ffc-ttjs-vti > > Thanks, > Maninder > > > > On Mon, May 26, 2025 at 12:47 AM Péter Váry <peter.vary.apa...@gmail.com> > wrote: > >> I'm interested, but can't be there, but please record the meeting. >> Thanks, >> Peter >> >> Maninderjit Singh <parmar.maninder...@gmail.com> ezt írta (időpont: >> 2025. máj. 24., Szo, 2:30): >> >>> Hi dev community, >>> I was wondering if we could join a call next week for discussing the >>> multi-table transactions so we can make progress. I have shared a meeting >>> invite where anyone who's interested in the discussion can join. Please let >>> me know if this works. >>> >>> Thanks, >>> Maninder >>> >>> Sync for iceberg multi-table transactions >>> Friday, May 30 · 9:00 – 10:00am >>> Time zone: America/Los_Angeles >>> Google Meet joining info >>> Video call link: https://meet.google.com/ffc-ttjs-vti >>> >>> >>> On Wed, May 21, 2025 at 10:25 AM Maninderjit Singh < >>> parmar.maninder...@gmail.com> wrote: >>> >>>> Hi dev community, >>>> Following up on the thread here to continue the discussion and get >>>> feedback since we couldn't get to it in sync. I think we have made some >>>> progress in the discussion that I want to capture while highlighting the >>>> items where we need to create consensus along with pros and cons. I would >>>> need help to add clarity and to make sure the arguments are captured >>>> correctly. >>>> >>>> *Things we agree on* >>>> >>>> 1. Don't maintain server side state for tracking the transactions. >>>> 2. Need global (catalog-wide) ordering of snapshots via some >>>> (hybrid/logical) clock/CSN >>>> 3. Optionally expose the catalog's clock/CSN information without >>>> changing how tables load >>>> 4. Loading consistent snapshot across multiple tables and >>>> repeatable reads based on the reference clock/CSN >>>> >>>> >>>> *Things we disagree on* >>>> >>>> 1. Reuse existing timestamp field vs introduce a new field CSN >>>> >>>> >>>> *Reusing timestamp field approach* >>>> >>>> - Pros: >>>> >>>> >>>> 1. Backwards compatibility, no change to table metadata spec so >>>> could be used by existing v2 tables. >>>> 2. Fixes existing time travel and ordering issues >>>> 3. Simplifies and clarifies the spec (no new id for snapshots) >>>> 4. Common notion of timestamp that could be used to evaluate causal >>>> relationships in other proposals like events or commit reports. >>>> >>>> >>>> - Cons >>>> >>>> >>>> 1. Unique timestamp generation in milliseconds. Potential >>>> mitigations: >>>> >>>> https://docs.google.com/document/d/1KVgUJc1WgftHfLz118vMbEE7HV8_pUDk4s-GJFDyAOE/edit?pli=1&disco=AAABjwaxXeg >>>> 2. Concerns about client side timestamp being overridden. >>>> >>>> *Adding new CSN field* >>>> >>>> - Pros: >>>> >>>> >>>> 1. Flexibility to use logical or hybrid clocks. Not sure how >>>> clients can generate a hybrid clock timestamp here without suffering >>>> from >>>> clock skew (Would be good to clarify this)? >>>> 2. No client side overriding concerns. >>>> >>>> >>>> - Cons: >>>> >>>> >>>> 1. Not backwards compatible, requires new field in table metadata >>>> so need to wait for v4 >>>> 2. Does not fix time travel and snapshot-log ordering issues >>>> 3. Adds another id for snapshots that clients need to generate and >>>> reason about. >>>> 4. Could not be extended to use in other proposals for causal >>>> reasoning. >>>> >>>> >>>> Thanks, >>>> Maninder >>>> >>>> On Tue, May 20, 2025 at 8:16 PM Maninderjit Singh < >>>> parmar.maninder...@gmail.com> wrote: >>>> >>>>> Appreciate the feedback on the "catalog-authored timestamp" document >>>>> <https://docs.google.com/document/d/1KVgUJc1WgftHfLz118vMbEE7HV8_pUDk4s-GJFDyAOE/edit?pli=1&tab=t.0> >>>>> ! >>>>> >>>>> Ryan, I don't think we can get consistent time travel queries in >>>>> iceberg without fixing the timestamp field since it's what the spec >>>>> <https://iceberg.apache.org/spec/#point-in-time-reads-time-travel> >>>>> prescribes for time travel. Hence I took the liberty to re-use it for the >>>>> catalog timestamp which ensures that snapshot-log is correctly ordered for >>>>> time travel. Additionally, the timestamp field needs to be fixed to avoid >>>>> breaking commits to the table due to accidental large skews as per current >>>>> spec, the scenario is described in detail here >>>>> <https://docs.google.com/document/d/1KVgUJc1WgftHfLz118vMbEE7HV8_pUDk4s-GJFDyAOE/edit?pli=1&tab=t.0#bookmark=id.6avx66vzo168> >>>>> . >>>>> The other benefit of reusing the timestamp field is spec simplicity >>>>> and clarity on timestamp generation responsibilities without requiring the >>>>> need to manage yet another identifier (in addition to sequence number, >>>>> snapshot id and timestamp) for snapshots. >>>>> >>>>> Jagdeep, your concerns about overriding the timestamp field are valid >>>>> but the reason I'm not too worried about it is because client can't assume >>>>> a commit is successful without their response being acknowledged by the >>>>> catalog which returns the CommitTableResponse >>>>> <https://github.com/apache/iceberg/blob/c2478968e65368c61799d8ca4b89506a61ca3e7c/open-api/rest-catalog-open-api.yaml#L3997> >>>>> with >>>>> new metadata (that has catalog authored timestamps in the proposal). I'm >>>>> happy to work with you to put something common together and get the best >>>>> out of the proposals. >>>>> >>>>> Thanks, >>>>> Maninder >>>>> >>>>> >>>>> >>>>> >>>>> On Tue, May 20, 2025 at 5:48 PM Jagdeep Sidhu <sidhujagde...@gmail.com> >>>>> wrote: >>>>> >>>>>> Thank you Ryan, Maninder and the rest of the community for feedback >>>>>> and ideas! >>>>>> Drew and I will take another pass and remove the catalog >>>>>> co-ordination requirement for LoadTable API, and bring the proposal >>>>>> closer >>>>>> to "catalog-authored timestamp" in the sense that clients can use CSN to >>>>>> find the right snapshot, but still leave upto Catalog on what it want to >>>>>> use for CSN (Hybrid clock timestamp or another monotonically increasing >>>>>> number). >>>>>> >>>>>> If more folks have feedback, please leave it in the doc or email >>>>>> list, so we can address it as well in the document update. >>>>>> >>>>>> Maninder, one reason we proposed a new field for CommitSequenceNumber >>>>>> instead of using an existing field is for backwards compatibility. >>>>>> Catalogs >>>>>> can start optionally exposing the new field, and interested clients can >>>>>> use >>>>>> the new field, but existing clients keep working as is. Existing and new >>>>>> clients can also keep working as is against the same tables in the >>>>>> same Catalog. My one worry is that having Catalog override the timestamp >>>>>> field for commits may break some existing clients? Today all Iceberg >>>>>> engines/clients do not expect the timestamp field in >>>>>> metadata/snapshot-log >>>>>> to be overwritten by the Catalog. >>>>>> >>>>>> How do you feel about taking the best from each proposal?, i.e. >>>>>> monotonically increasing commit sequence numbers (some catalogs can use >>>>>> timestamps, some can use logical clock but we don't have to enforce it - >>>>>> leave it up to Catalog), but keep client side logic for resolving the >>>>>> right >>>>>> snapshot using sequence numbers instead of adding that functionality to >>>>>> Catalog. Let me know! >>>>>> >>>>>> Thank you! >>>>>> -Jagdeep >>>>>> >>>>>> On Tue, May 20, 2025 at 2:45 PM Ryan Blue <rdb...@gmail.com> wrote: >>>>>> >>>>>>> Thanks for the proposals! There are things that I think are good >>>>>>> about both of them. I think that the catalog-authored timestamps >>>>>>> proposal >>>>>>> misunderstands the purpose of the timestamp field, but does get right >>>>>>> that >>>>>>> a monotonically increasing "time" field (really a sequence number) >>>>>>> across >>>>>>> tables enables the coordination needed for snapshot isolated reads. I >>>>>>> like >>>>>>> that the sequence number proposal leaves the meaning of the field to the >>>>>>> catalog for coordination, but it still proposes catalog coordination by >>>>>>> loading tables "at" some sequence number. Ideally, we would be able to >>>>>>> (optionally) expose this extra catalog information to clients and not >>>>>>> need >>>>>>> to change how loading works. >>>>>>> >>>>>>> Ryan >>>>>>> >>>>>>> On Tue, May 20, 2025 at 9:45 AM Ryan Blue <rdb...@gmail.com> wrote: >>>>>>> >>>>>>>> Hi everyone, >>>>>>>> >>>>>>>> To avoid passing copies of a file around for comments, I put the >>>>>>>> doc for commit sequence numbers into Google so we can comment on a >>>>>>>> central >>>>>>>> copy: >>>>>>>> https://docs.google.com/document/d/1jr4Ah8oceOmo6fwxG_0II4vKDUHUKScb/edit?usp=sharing&ouid=100239850723655533404&rtpof=true&sd=true >>>>>>>> >>>>>>>> Ryan >>>>>>>> >>>>>>>> On Fri, May 16, 2025 at 2:51 AM Maninderjit Singh < >>>>>>>> parmar.maninder...@gmail.com> wrote: >>>>>>>> >>>>>>>>> Thanks for the updated proposal Drew! >>>>>>>>> My preference for using the catalog authored timestamp is to >>>>>>>>> minimize changes to the REST spec so we can have good backwards >>>>>>>>> compatibility. I have quickly put together a draft proposal on how >>>>>>>>> this >>>>>>>>> should work. Looking forward to feedback and discussion. >>>>>>>>> >>>>>>>>> Draft Proposal: Catalog‑Authored Timestamps for Apache Iceberg >>>>>>>>> REST Catalog >>>>>>>>> <https://drive.google.com/open?id=1KVgUJc1WgftHfLz118vMbEE7HV8_pUDk4s-GJFDyAOE> >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Maninder >>>>>>>>> >>>>>>>>> On Wed, May 14, 2025 at 6:12 PM Drew <img...@gmail.com> wrote: >>>>>>>>> >>>>>>>>>> Hi everyone, >>>>>>>>>> >>>>>>>>>> Thank you for feedback on the MTT proposal and during community >>>>>>>>>> sync. Based on it, Jagdeep and I have iterated on the document and >>>>>>>>>> added a >>>>>>>>>> second option to use *Catalog CommitSequenceNumbers*. Looking >>>>>>>>>> forward to getting more feedback on the proposal, where to add more >>>>>>>>>> details >>>>>>>>>> or approach/changes to consider. We appreciate everyone's time on >>>>>>>>>> this! >>>>>>>>>> >>>>>>>>>> The option introduces *Catalog CommitSequenceNumbers(CSNs)*, >>>>>>>>>> which allow clients/engines to read a consistent view of multiple >>>>>>>>>> tables >>>>>>>>>> without needing to register a transaction context with the catalog. >>>>>>>>>> This >>>>>>>>>> removes the need of registering a transaction context with Catalog, >>>>>>>>>> thus >>>>>>>>>> removing the need of transaction bookkeeping on the catalog side. For >>>>>>>>>> aborting transactions early, clients can use LoadTable with and >>>>>>>>>> without CSN >>>>>>>>>> to figure out if there is already a conflicting write on any of the >>>>>>>>>> tables >>>>>>>>>> being modified. Also removed the section where transactions were >>>>>>>>>> staging >>>>>>>>>> commits on Catalog, and changed the proposal to align with Eduard's >>>>>>>>>> PR >>>>>>>>>> around staging changes locally before commit ( >>>>>>>>>> https://github.com/apache/iceberg/pull/6948). >>>>>>>>>> >>>>>>>>>> Jagdeep also clarified in an example in a previous email where a >>>>>>>>>> workload may require multi table snapshot isolation, even if the >>>>>>>>>> tables are >>>>>>>>>> being updated without Multi-Table commit API. Though most MTT >>>>>>>>>> transactions >>>>>>>>>> will commit using the multi table commit API. >>>>>>>>>> >>>>>>>>>> Maninder, for the approach of "common notion of time between >>>>>>>>>> clients and catalog" - I spent some time thinking about it, but >>>>>>>>>> cannot find >>>>>>>>>> a feasible way to do this. Yes, the catalogs can use a high precision >>>>>>>>>> clock, but clients cannot use Catalog Timestamp from API calls to >>>>>>>>>> set local >>>>>>>>>> clock due to network latency for request/response. For example, >>>>>>>>>> different >>>>>>>>>> requests to the same Catalog servers can return different timestamps >>>>>>>>>> based >>>>>>>>>> on network latency. Also what if a client works with more than 1 >>>>>>>>>> Catalog. >>>>>>>>>> If you want to do a rough write-up or share a reference >>>>>>>>>> implementation that >>>>>>>>>> uses such an approach, I will be happy to brainstorm it more. Let us >>>>>>>>>> know! >>>>>>>>>> >>>>>>>>>> Here is the link to updated proposal >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> <https://docs.google.com/document/d/1jr4Ah8oceOmo6fwxG_0II4vKDUHUKScb/edit?usp=sharing&ouid=100384647237395649950&rtpof=true&sd=true> >>>>>>>>>> Thanks Again! >>>>>>>>>> - Drew >>>>>>>>>> >>>>>>>>>