You're right, they should be the same. Next time this happens, set the log level to debug (from StorageService jmx) on the surviving nodes and let a couple queries fail, before restarting the 3rd (and setting level back to info).
On Sat, Dec 4, 2010 at 12:01 AM, Dan Hendry <dan.hendry.j...@gmail.com> wrote: > Doesn't consistency level ALL=QUORUM at RF=2 ? > > I have not had a chance to test your fix but I don't THINK this is the > issue. If it is the issue, how do consistency levels ALL and QUORUM differ > at this replication factor? > > On Sat, Dec 4, 2010 at 12:03 AM, Jonathan Ellis <jbel...@gmail.com> wrote: >> >> I think you are running into >> https://issues.apache.org/jira/browse/CASSANDRA-1316, where when an >> inconsistency on QUORUM/ALL is discovered it always peformed the >> repair at QUORUM instead of the original CL. Thus, reading at ALL you >> would see the correct answer on the 2nd read but you weren't >> guaranteed to see it on the first. >> >> This was fixed in 0.6.4 but apparently I botched the merge to the 0.7 >> branch. I corrected that just now, so when you update, you should be >> good to go. >> >> On Fri, Dec 3, 2010 at 9:19 PM, Dan Hendry <dan.hendry.j...@gmail.com> >> wrote: >> > I am seeing fairly strange, behavior in my Cassandra cluster. >> > Setup >> > - 3 nodes (lets call them nodes 1 2 and 3) >> > - RF=2 >> > - A set of servers (producers) which which write data to the cluster at >> > consistency level ONE >> > - A set of servers (consumers/processors) which read data from the >> > cluster >> > at consistency level ALL >> > - Cassandra 0.7 (recent out of the svn branch, post beta 3) >> > - Clients use the pelops library >> > Situation: >> > - Everything is humming along nicely >> > - A Cassandra node (say 3) goes down (even with 24 GB of ram, OOM >> > errors >> > are the bain of my existence) >> > - Producers continue to happily write to the cluster but consumers >> > start >> > complaining by throwing TimeOutExceptions and UnavailableExceptions. >> > - I stagger out of bed in the middle of the night and restart Cassandra >> > on >> > node 3. >> > - The consumers stop complaining and get back to business but generate >> > garbage data for the period node 3 was down. Its almost like half the >> > data >> > is missing half the time. (Again, I am reading at consistency level >> > ALL). >> > - I force the consumers to reprocess data for the period node 3 was >> > down. >> > They generate accurate output which is different from the first time >> > round. >> > To be explicit, what seems to be happening is first read at consistency >> > ALL >> > gives "A,C,E" (for example) and the second read at consistency level ALL >> > gives "A,B,C,D,E". Is this a Cassandra bug? Is my knowledge of >> > consistency >> > levels flawed? My understanding is that you could achieve strongly >> > consistent behavior by writing at ONE and reading at ALL. >> > After this experience, my theory (uneducated, untested, and >> > under-researched) is that "strong consistency" applies only to column >> > values, not the set of columns (or super-columns in this case) which >> > make up >> > a row. Any thoughts? >> >> >> >> -- >> Jonathan Ellis >> Project Chair, Apache Cassandra >> co-founder of Riptano, the source for professional Cassandra support >> http://riptano.com > > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com