I am seeing fairly strange, behavior in my Cassandra cluster.

Setup
 - 3 nodes (lets call them nodes 1 2 and 3)
 - RF=2
 - A set of servers (producers) which which write data to the cluster at
consistency level ONE
 - A set of servers (consumers/processors) which read data from the cluster
at consistency level ALL
 - Cassandra 0.7 (recent out of the svn branch, post beta 3)
 - Clients use the pelops library

Situation:
 - Everything is humming along nicely
 - A Cassandra node (say 3) goes down (even with 24 GB of ram, OOM errors
are the bain of my existence)
 - Producers continue to happily write to the cluster but consumers start
complaining by throwing TimeOutExceptions and UnavailableExceptions.
 - I stagger out of bed in the middle of the night and restart Cassandra on
node 3.
 - The consumers stop complaining and get back to business but generate
garbage data for the period node 3 was down. Its almost like half the data
is missing half the time. (Again, I am reading at consistency level ALL).
 - I force the consumers to reprocess data for the period node 3 was down.
They generate accurate output which is different from the first time round.

To be explicit, what seems to be happening is first read at consistency ALL
gives "A,C,E" (for example) and the second read at consistency level ALL
gives "A,B,C,D,E". Is this a Cassandra bug? Is my knowledge of consistency
levels flawed? My understanding is that you could achieve strongly
consistent behavior by writing at ONE and reading at ALL.

After this experience, my theory (uneducated, untested, and
under-researched) is that "strong consistency" applies only to column
values, not the set of columns (or super-columns in this case) which make up
a row. Any thoughts?

Reply via email to