Anecdotal CAS works differently than the typical cassandra workload. If you run a stress instance 3 nodes one host, you find that you typically run into CPU issues, but if you are doing a CAS workload you see things timing out and before you hit 100% CPU. It is a strange beast.
On Fri, Dec 23, 2016 at 7:28 AM, horschi <hors...@gmail.com> wrote: > Update: I replace all quorum reads on that table with serial reads, and > now these errors got less. Somehow quorum reads on CAS values cause most of > these WTEs. > > Also I found two tickets on that topic: > https://issues.apache.org/jira/browse/CASSANDRA-9328 > https://issues.apache.org/jira/browse/CASSANDRA-8672 > > On Thu, Dec 15, 2016 at 3:14 PM, horschi <hors...@gmail.com> wrote: > >> Hi, >> >> I would like to warm up this old thread. I did some debugging and found >> out that the timeouts are coming from StorageProxy.proposePaxos() >> - callback.isFullyRefused() returns false and therefore triggers a >> WriteTimeout. >> >> Looking at my ccm cluster logs, I can see that two replica nodes return >> different results in their ProposeVerbHandler. In my opinion the >> coordinator should not throw a Exception in such a case, but instead retry >> the operation. >> >> What do the CAS/Paxos experts on this list say to this? Feel free to >> instruct me to do further tests/code changes. I'd be glad to help. >> >> Log: >> >> node1/logs/system.log:WARN [SharedPool-Worker-5] 2016-12-15 14:48:36,896 >> PaxosState.java:124 - Rejecting proposal for >> Commit(2d803540-c2cd-11e6-2e48-53a129c60cfc, >> [MDS.Lock] key=locktest_ 1 columns=[[] | [value]] >> node1/logs/system.log- Row: id=@ | value=<tombstone>) because >> inProgress is now Commit(2d8146b0-c2cd-11e6-f996-e5c8d88a1da4, >> [MDS.Lock] key=locktest_ 1 columns=[[] | [value]] >> -- >> node1/logs/system.log:ERROR [SharedPool-Worker-12] 2016-12-15 >> 14:48:36,980 StorageProxy.java:506 - proposePaxos: >> Commit(2d803540-c2cd-11e6-2e48-53a129c60cfc, [MDS.Lock] key=locktest_ 1 >> columns=[[] | [value]] >> node1/logs/system.log- Row: id=@ | value=<tombstone>)//1//0 >> -- >> node2/logs/system.log:WARN [SharedPool-Worker-7] 2016-12-15 14:48:36,969 >> PaxosState.java:117 - Accepting proposal: >> Commit(2d803540-c2cd-11e6-2e48-53a129c60cfc, >> [MDS.Lock] key=locktest_ 1 columns=[[] | [value]] >> node2/logs/system.log- Row: id=@ | value=<tombstone>) >> -- >> node3/logs/system.log:WARN [SharedPool-Worker-2] 2016-12-15 14:48:36,897 >> PaxosState.java:124 - Rejecting proposal for >> Commit(2d803540-c2cd-11e6-2e48-53a129c60cfc, >> [MDS.Lock] key=locktest_ 1 columns=[[] | [value]] >> node3/logs/system.log- Row: id=@ | value=<tombstone>) because >> inProgress is now Commit(2d8146b0-c2cd-11e6-f996-e5c8d88a1da4, >> [MDS.Lock] key=locktest_ 1 columns=[[] | [value]] >> >> >> kind regards, >> Christian >> >> >> On Fri, Apr 15, 2016 at 8:27 PM, Denise Rogers <datag...@aol.com> wrote: >> >>> My thinking was that due to the size of the data that there maybe I/O >>> issues. But it sounds more like you're competing for locks and hit a >>> deadlock issue. >>> >>> Regards, >>> Denise >>> Cell - (860)989-3431 <(860)%20989-3431> >>> >>> Sent from mi iPhone >>> >>> On Apr 15, 2016, at 9:00 AM, horschi <hors...@gmail.com> wrote: >>> >>> Hi Denise, >>> >>> in my case its a small blob I am writing (should be around 100 bytes): >>> >>> CREATE TABLE "Lock" ( >>> lockname varchar, >>> id varchar, >>> value blob, >>> PRIMARY KEY (lockname, id) >>> ) WITH COMPACT STORAGE >>> AND COMPRESSION = { 'sstable_compression' : 'SnappyCompressor', >>> 'chunk_length_kb' : '8' }; >>> >>> You ask because large values are known to cause issues? Anything special >>> you have in mind? >>> >>> kind regards, >>> Christian >>> >>> >>> >>> >>> On Fri, Apr 15, 2016 at 2:42 PM, Denise Rogers <datag...@aol.com> wrote: >>> >>>> Also, what type of data were you reading/writing? >>>> >>>> Regards, >>>> Denise >>>> >>>> Sent from mi iPad >>>> >>>> On Apr 15, 2016, at 8:29 AM, horschi <hors...@gmail.com> wrote: >>>> >>>> Hi Jan, >>>> >>>> were you able to resolve your Problem? >>>> >>>> We are trying the same and also see a lot of WriteTimeouts: >>>> WriteTimeoutException: Cassandra timeout during write query at >>>> consistency SERIAL (2 replica were required but only 1 acknowledged the >>>> write) >>>> >>>> How many clients were competing for a lock in your case? In our case >>>> its only two :-( >>>> >>>> cheers, >>>> Christian >>>> >>>> >>>> On Tue, Sep 24, 2013 at 12:18 AM, Robert Coli <rc...@eventbrite.com> >>>> wrote: >>>> >>>>> On Mon, Sep 16, 2013 at 9:09 AM, Jan Algermissen < >>>>> jan.algermis...@nordsc.com> wrote: >>>>> >>>>>> I am experimenting with C* 2.0 ( and today's java-driver 2.0 >>>>>> snapshot) for implementing distributed locks. >>>>>> >>>>> >>>>> [ and I'm experiencing the problem described in the subject ... ] >>>>> >>>>> >>>>>> Any idea how to approach this problem? >>>>>> >>>>> >>>>> 1) Upgrade to 2.0.1 release. >>>>> 2) Try to reproduce symptoms. >>>>> 3) If able to, file a JIRA at https://issues.apache.org/jira >>>>> /secure/Dashboard.jspa including repro steps >>>>> 4) Reply to this thread with the JIRA ticket URL >>>>> >>>>> =Rob >>>>> >>>>> >>>>> >>>> >>>> >>> >> >