I ran the same cql query against my 3 nodes (after adding the third and repairing each of them):
On the new node: cqlsh:mykeyspace> select '20121029#myevent' from 'mycf' where key = '887#day'; 20121029#myevent ------------------- 4983 On the 2 others (old nodes): cqlsh:mykeyspace> select '20121029#myevent' from 'mycf' where key = '887#day'; 20121029#myevent ------------------- 4254 And the read value atc CL.QUORUM is 4943, which is the good value. How is it possible that QUORUM read 4943 with only 1 node out of 3 answering that count ? How could a new node, get a value that none of other existing node has ? Is there a way to fix the data (isn't repair supposed to do it) ? Alain 2012/11/1 Alain RODRIGUEZ <arodr...@gmail.com> > "Can you try it thought, or run a repair ?" > > Repairing didn't help > > "My first thought is to use QUOURM" > > This fix the problem. However, my data is probably still inconsistent, > even if I read now always the same value. The point is that I can't handle > a crash with CL.QUORUM, I can't even restart a node... > > I will add a third server. > > "But isn't Cassandra suppose to handle a server crash ? When a server > crashes I guess it don't drain before..." > > "I was asking to understand how you did the upgrade." > > Ok. On my side I am just concern about the possibility of using counters > with CL.ONE and correctly handle a crash or restart without a drain. > > Alain > > > > 2012/11/1 aaron morton <aa...@thelastpickle.com> > >> "What CL are you using ?" >> >> I think this can be what causes the issue. I'm writing and reading at CL >> ONE. I didn't drain before stopping Cassandra and this may have produce a >> fail in the current counters (those which were being written when I stopped >> a server). >> >> My first thought is to use QUOURM. But with only two nodes it's hard to >> get strong consistency using QUOURM. >> Can you try it thought, or run a repair ? >> >> But isn't Cassandra suppose to handle a server crash ? When a server >> crashes I guess it don't drain before... >> >> I was asking to understand how you did the upgrade. >> >> Cheers >> >> ----------------- >> Aaron Morton >> Freelance Developer >> @aaronmorton >> http://www.thelastpickle.com >> >> On 1/11/2012, at 11:39 AM, Alain RODRIGUEZ <arodr...@gmail.com> wrote: >> >> "What version of cassandra are you using ?" >> >> 1.1.2 >> >> "Can you explain this further?" >> >> I had an unexplained amount of reads (up to 1800 r/s and 90 Mo/s) on one >> server the other was doing about 200 r/s and 5 Mo/s max. I fixed it by >> rebooting the server. This server is dedicated to cassandra. I can't tell >> you more about it 'cause I don't get it... But a simple Cassandra restart >> wasn't enough. >> >> "Was something writing to the cluster ?" >> >> Yes we are having some activity and perform about 600 w/s. >> >> "Did you drain for the upgrade ?" >> >> We upgrade a long time ago and to 1.1.2. This warning is about the >> version 1.1.6. >> >> "What changes did you make ?" >> >> In the cassandra.yaml I just change the "compaction_throughput_mb_per_sec" >> property to slow down my compaction a bit. I don't think the problem come >> from here. >> >> "Are you saying that a particular counter column is giving different >> values for different reads ?" >> >> Yes, this is exactly what I was saying. Sorry if something is wrong with >> my English, it's not my mother tongue. >> >> "What CL are you using ?" >> >> I think this can be what causes the issue. I'm writing and reading at CL >> ONE. I didn't drain before stopping Cassandra and this may have produce a >> fail in the current counters (those which were being written when I stopped >> a server). >> >> But isn't Cassandra suppose to handle a server crash ? When a server >> crashes I guess it don't drain before... >> >> Thank you for your time Aaron, once again. >> >> Alain >> >> >> >> 2012/10/31 aaron morton <aa...@thelastpickle.com> >> >>> What version of cassandra are you using ? >>> >>> I finally restart Cassandra. It didn't solve the problem so I stopped >>>> Cassandra again on that node and restart my ec2 server. This solved the >>>> issue (1800 r/s to 100 r/s). >>> >>> Can you explain this further? >>> Was something writing to the cluster ? >>> Did you drain for the upgrade ? >>> https://github.com/apache/cassandra/blob/cassandra-1.1/NEWS.txt#L17 >>> >>> Today I changed my cassandra.yml and restart this same server to apply >>>> my conf. >>> >>> What changes did you make ? >>> >>> I just noticed that my homepage (which uses a Cassandra counter and >>>> refreshes every sec) shows me 4 different values. 2 of them repeatedly >>>> (5000 and 4000) and the 2 other some rare times (5500 and 3800) >>> >>> Are you saying that a particular counter column is giving different >>> values for different reads ? >>> What CL are you using ? >>> >>> Cheers >>> >>> ----------------- >>> Aaron Morton >>> Freelance Developer >>> @aaronmorton >>> http://www.thelastpickle.com >>> >>> On 31/10/2012, at 3:39 AM, Jason Wee <peich...@gmail.com> wrote: >>> >>> maybe enable the debug in log4j-server.properties and going through the >>> log to see what actually happen? >>> >>> On Tue, Oct 30, 2012 at 7:31 PM, Alain RODRIGUEZ <arodr...@gmail.com>wrote: >>> >>>> Hi, >>>> >>>> I have an issue with counters, yesterday I had a lot of >>>> ununderstandable reads/sec on one server. I finally restart Cassandra. It >>>> didn't solve the problem so I stopped Cassandra again on that node and >>>> restart my ec2 server. This solved the issue (1800 r/s to 100 r/s). >>>> >>>> Today I changed my cassandra.yml and restart this same server to apply >>>> my conf. >>>> >>>> I just noticed that my homepage (which uses a Cassandra counter and >>>> refreshes every sec) shows me 4 different values. 2 of them repeatedly >>>> (5000 and 4000) and the 2 other some rare times (5500 and 3800) >>>> >>>> Only the counters made today and yesterday are concerned. >>>> >>>> I performed a repair without success. These data are the heart of our >>>> business so if someone had any clue on it, I would be really grateful... >>>> >>>> The sooner the better, I am in production with these random counters. >>>> >>>> Alain >>>> >>>> INFO: >>>> >>>> My environnement is 2 nodes (EC2 large), RF 2, CL.ONE (R & W), Random >>>> Partitioner. >>>> >>>> xxx.xxx.xxx.241 eu-west 1b Up Normal 151.95 GB >>>> 50.00% 0 >>>> xxx.xxx.xxx.109 eu-west 1b Up Normal 117.71 GB >>>> 50.00% 85070591730234615865843651857942052864 >>>> >>>> Here is my conf: http://pastebin.com/5cMuBKDt >>>> >>>> >>>> >>> >>> >> >> >