Thanks for clarifying Jonathan. I agree with your example. It seems we have now moved into discussing specific requirements/semantics for an interactive transaction implementation. Which is interesting, but beyond what I will have time to think about tonight. At least off the top of my head I can't say I have any data or experience to say how important it is to satisfy the use case you are outlining.
As a gut feeling, I believe the alternative proposal outlined by Alex and Benedict would take such locks in the database nodes that you describe. But again, we'll have to return to this another day as my today is almost over. henrik On Thu, Oct 14, 2021 at 7:39 PM Jonathan Ellis <jbel...@gmail.com> wrote: > ... which is a long way of saying, in postgresql those errors are there as > part of checking for correctness -- when you see them it means you did not > ask for the appropriate locks. It's not expected that you should write > try/catch/retry loops to work around this. > > On Thu, Oct 14, 2021 at 11:13 AM Jonathan Ellis <jbel...@gmail.com> wrote: > > > [Moving followup here from the other thread] > > > > I think there is in fact a difference here. > > > > Consider a workload consisting of two clients. One of them is submitting > > a stream of TPC-C new order transactions (new order client = NOC), and > the > > other is performing a simple increment of district next order ids > > (increment district client = IDC). > > > > If we run these two workloads in postgresql under READ COMMITTED, both > > clients will proceed happily (although we will get serialization > anomalies). > > > > If we run them in pg under SERIALIZABLE, then the NOC client will get the > > "could not serialize access" error whenever the IDC client updates the > > district concurrently, which will be effectively every time since the IDC > > transaction is much simpler. But, SQL gives you a tool to allow NOC to > > make progress, which is SELECT FOR UPDATE. If the NOC performs its first > > read with FOR UPDATE then it will (1) block until the current IDC > > transaction completes and then (2) grab a lock that prevents further > > updates from happening concurrently, allowing NOC to make progress. > > Neither NOC nor IDC will ever get a "could not serialize access" error. > > > > It looks to me like the proposed design here would (1) not allow NOC to > > make progress at READ COMMITTED, but also (2) does not provide the tools > to > > achieve progress with SERIALIZABLE either since locking outside of the > > global consensus does not make sense. > > > > On Wed, Oct 13, 2021 at 1:59 PM Henrik Ingo <henrik.i...@datastax.com> > > wrote: > > > >> Sorry Jonathan, didn't see this reply earlier today. > >> > >> That would be common behaviour for many MVCC databases, including > MongoDB, > >> MySQL Galera Cluster, PostgreSQL... > >> > >> > https://urldefense.com/v3/__https://www.postgresql.org/docs/9.5/transaction-iso.html__;!!PbtH5S7Ebw!KP0b2eRHpf-D6w1012nea4UbnsxtFn-zUEBrAZ7ghBFDr_QQyTT6qHzgZ0KKUKxt_64$ > >> > >> *"Applications using this level must be prepared to retry transactions > due > >> to serialization failures."* > >> > >> On Wed, Oct 13, 2021 at 3:19 AM Jonathan Ellis <jbel...@gmail.com> > wrote: > >> > >> > Hi Henrik, > >> > > >> > I don't see how this resolves the fundamental problem that I outlined > to > >> > start with, namely, that without having the entire logic of the > >> transaction > >> > available to it, the server cannot retry the transaction when > concurrent > >> > changes are found to have been applied after the reconnaissance reads > >> (what > >> > you call the conversational phase). > > > > > > On Wed, Oct 13, 2021 at 5:00 AM Henrik Ingo <henrik.i...@datastax.com> > > wrote: > > > >> On Wed, Oct 13, 2021 at 1:26 AM Blake Eggleston > >> <beggles...@apple.com.invalid> wrote: > >> > >> > Hi Henrik, > >> > > >> > I would agree that the local serial experience for valid use cases > >> should > >> > be supported in some form before legacy LWT is replaced by Accord. > >> > > >> > > >> Great! It seems there's a seed of consensus on this point. > >> > >> > >> > Regarding your read committed proposal, I think this CEP discussion > has > >> > already spent too much time talking about hypothetical SQL > >> implementations, > >> > and I’d like to avoid veering off course again. However, since you’ve > >> asked > >> > a well thought out question with concrete goals and implementation > >> ideas, > >> > I’m happy to answer it. I just ask that if you want to discuss it > >> beyond my > >> > reply, you start a separate ‘[IDEA] Read committed transaction with > >> Accord’ > >> > thread where we could talk about it a bit more without it feeling like > >> we > >> > need to delay a vote. > >> > > >> > > >> This is a reasonable request. We were already in a side thread I guess, > >> but > >> I like organizing discussions into separate threads... > >> > >> Let's see if I manage to break the thread correctly simply by editing > the > >> subject... > >> > >> FWIW, my hope for this discussion was that by providing a simple yet > >> concrete example, it would facilitate the discussion toward a CEP-15 > vote, > >> not distract from it. As it happened, Alex Miller was writing a hugely > >> helpful email concurrently with mine, which improves details in CEP-15, > so > >> I don't know if expecting the discussion to die out just yet is ignoring > >> people who maybe working off list to still understand this rather > advanced > >> reading material. > >> > >> > >> > >> > So I think it could work with some modifications. > >> > > >> > First you’d need to perform your select statements as accord reads, > not > >> > quorum reads. Otherwise you may not see writes that have been (or > could > >> > have been) committed. A multi-partition write could also appear to > >> become > >> > undone, if a write commit has not reached one of the keys or needs to > be > >> > recovered. > >> > > >> > >> Ah right. I think we established early on that tables should be either > >> Accord-only, or legacy C* only. I was too fixated on the "no other > >> changes" > >> and forgot this. > >> > >> This is then a very interesting detail you point out! It seems like > >> potentially every statement now needs to go through the Accord consensus > >> protocol, and this could become expensive, where my goal was to design > the > >> simplest and most lightweight example thinkable. BUT for read-only > Accord > >> transactions, where I specifically also don't care about > serializability, > >> wouldn't this be precisely the case where I can simply pick my own > >> timestamp and do a stale read from a nearby replica? > >> > >> > >> > > >> > Second, when you talk about transforming mutations, I’m assuming > you’re > >> > talking about confirming primary keys do or do not exist, > >> > >> > >> No, I was thinking more broadly of operations like `UPDATE table1 SET > >> column1=x WHERE pk >= 10 and pk <= 20` > >> > >> My thinking was that I need to know the exact primary keys touched both > >> during the conversational phase and the commit phase. In essence, this > is > >> an interactive reconnaisance phase. > >> > >> You make a great point that for statements where the PK is explicit, > they > >> can just be directly added to the write set and transaction state. Ex: > >> `UPDATE table1 SET column1=x WHERE pk IN (1,2,3)` > >> > >> > >> > >> > and supporting auto-incrementing primary keys. To confirm primary keys > >> do > >> > or do not exist, you’d also need to perform an accord read also. > >> > >> > >> For sure. > >> > >> > >> > For auto-incrementing primary keys, you’d need to do an accord > >> read/write > >> > operation to increment a counter somewhere (or just use uuids). > >> > > >> > > >> I had not considered auto-increment at all, but if that would be a > >> requirement, then I tend to translate "auto-increment" into "any service > >> that can hand out unique integers". (In practice, no database can force > me > >> to commit the integers in the order that they're actually monotonically > >> increasing, so "auto-increment" is an illusion, I realized at some point > >> in > >> my career.) > >> > >> > >> > Finally, read committed does lock rows, so you’d still need to > perform a > >> > read on commit to confirm that the rows being written to haven’t been > >> > modified since the transaction began. > >> > > >> > >> Hmm... > >> > >> As we see in a separate discussion is already diving into this, it seems > >> like at least the SQL 1992 standard only says read committed must > protect > >> against P1 and that's it. My suspicion is that since most modern > databases > >> start from MVCC, they essentially "over deliver" when providing read > >> committed, since the implementation naturally provides snapshot reads > and > >> in fact it would be complicated to do something less consistent. > >> > >> For this discussion it's not really important which interpretation is > >> correct, since either is a reasonable semantic. For my purposes I'll > just > >> note that needing to re-execute all reads during the Accord phase > (commit > >> phase) would make the design more expensive, since the transaction is > now > >> executed twice. The goal of a simplistic light weight semantic is > achieved > >> by not doing so and claiming the weaker interpretation of read committed > >> is > >> "correct". > >> > >> henrik > >> > >> -- > >> > >> Henrik Ingo > >> > >> +358 40 569 7354 <358405697354> > >> > >> [image: Visit us online.] <https://www.datastax.com/> [image: Visit us > >> on > >> Twitter.] <https://twitter.com/DataStaxEng> [image: Visit us on > >> YouTube.] > >> < > >> > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e= > >> > > >> [image: Visit my LinkedIn profile.] < > >> > https://urldefense.com/v3/__https://www.linkedin.com/in/heingo/__;!!PbtH5S7Ebw!KP0b2eRHpf-D6w1012nea4UbnsxtFn-zUEBrAZ7ghBFDr_QQyTT6qHzgZ0KKEnhk1jo$ > > > >> > > > > > > -- > > Jonathan Ellis > > co-founder, http://www.datastax.com > > @spyced > > > > > -- > Jonathan Ellis > co-founder, http://www.datastax.com > @spyced > -- Henrik Ingo +358 40 569 7354 <358405697354> [image: Visit us online.] <https://www.datastax.com/> [image: Visit us on Twitter.] <https://twitter.com/DataStaxEng> [image: Visit us on YouTube.] <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e=> [image: Visit my LinkedIn profile.] <https://www.linkedin.com/in/heingo/>