Mahdi , the issue in your code is here: else // we lost LWT, fetch the winning value 9 existing_id = SELECT id FROM hash_id WHERE hash=computed_hash | consistency = ONE
You lost LWT, it means that there is a concurrent LWT that has won the Paxos round and has applied the value using QUORUM/SERIAL. In best case, it means that the won LWT value has been applied to at least 2 replicas out of 3 (assuming RF=3) In worst case, the won LWT value has not been applied yet or is pending to be applied to any replica Now, if you immediately read with CL=ONE, you may: 1) Read the staled value on the 3rd replica which has not yet received the correct won LWT value 2) Or worst, read a staled value because the won LWT is being applied when the read operation is made That's the main reason reading with CL=SERIAL is recommended (CL=QUORUM is not sufficient enough) Reading with CL=SERIAL will: a. like QUORUM, contact strict majority of replicas b. unlike QUORUM, look for validated (but not yet applied) previous Paxos round value and force-applied it before actually reading the new value On Sun, Feb 11, 2018 at 5:36 PM, Mahdi Ben Hamida <ma...@signalfx.com> wrote: > Totally understood that it's not worth (or it's rather incorrect) to mix > serial and non serial operations for LWT tables. It would be highly > satisfying to my engineer mind if someone can explain why that would cause > issues in this particular situation. The only explanation I have is that a > non serial read may cause a read repair to happen and that could interfere > with a concurrent serial write, although I still can't explain how that > would cause two different "insert if not exist" transactions to both > succeed. > > -- > Mahdi. > > On 2/9/18 2:40 PM, Jonathan Haddad wrote: > > If you want consistent reads you have to use the CL that enforces it. > There’s no way around it. > On Fri, Feb 9, 2018 at 2:35 PM Mahdi Ben Hamida <ma...@signalfx.com> > wrote: > >> In this case, we only write using CAS (code guarantees that). We also >> never update, just insert if not exist. Once a hash exists, it never >> changes (it may get deleted later and that'll be a CAS delete as well). >> >> -- >> Mahdi. >> >> On 2/9/18 1:38 PM, Jeff Jirsa wrote: >> >> >> >> On Fri, Feb 9, 2018 at 1:33 PM, Mahdi Ben Hamida <ma...@signalfx.com> >> wrote: >> >>> Under what circumstances would we be reading inconsistent results ? Is >>> there a case where we end up reading a value that actually end up not being >>> written ? >>> >>> >>> >> >> If you ever write the same value with CAS and without CAS (different code >> paths both updating the same value), you're using CAS wrong, and >> inconsistencies can happen. >> >> >> >> >