If you are reading and writing at quorum, then what you are seeing shouldn't happen. You shouldn't be able to read N+1 until N+1 has been committed to a quorum of servers. At this point you should not be able to read N anymore, since there is no quorum that contains N.
Dan - I think you are right, except that quorum reads should be consistent even during a quorum write. You are not guaranteed to read N+1 until *after* a successful quorum write of N+1, but once you see N+1, you should never see N again, even if the write failed. Sean On Fri, Apr 15, 2011 at 1:29 PM, Dan Hendry <dan.hendry.j...@gmail.com> wrote: > So Cassandra does not use an atomic commit protocol at the cluster level. > Strong consistency on a quorum read is only guaranteed *after* a successful > quorum write. The behaviour you are seeing is possible if you are reading in > the middle of a write or the write failed (which should be reported to your > code via an exception). > > Dan > > -----Original Message----- > From: James Cipar [mailto:jci...@cmu.edu] > Sent: April-15-11 14:15 > To: user@cassandra.apache.org > Subject: Consistency model > > I've been experimenting with the consistency model of Cassandra, and I found > something that seems a bit unexpected. In my experiment, I have 2 > processes, a reader and a writer, each accessing a Cassandra cluster with a > replication factor greater than 1. In addition, sometimes I generate > background traffic to simulate a busy cluster by uploading a large data file > to another table. > > The writer executes a loop where it writes a single row that contains just > an sequentially increasing sequence number and a timestamp. In python this > looks something like: > > while time.time() < start_time + duration: > target_server = random.sample(servers, 1)[0] > target_server = '%s:9160'%target_server > > row = {'seqnum':str(seqnum), 'timestamp':str(time.time())} > seqnum += 1 > # print 'uploading to server %s, %s'%(target_server, row) > > > pool = pycassa.connect('Keyspace1', [target_server]) > cf = pycassa.ColumnFamily(pool, 'Standard1') > cf.insert('foo', row, write_consistency_level=consistency_level) > pool.dispose() > > if sleeptime > 0.0: > time.sleep(sleeptime) > > > The reader simply executes a loop reading this row and reporting whenever a > sequence number is *less* than the previous sequence number. As expected, > with consistency_level=ConsistencyLevel.ONE there are many inconsistencies, > especially with a high replication factor. > > What is unexpected is that I still detect inconsistencies when it is set at > ConsistencyLevel.QUORUM. This is unexpected because the documentation seems > to imply that QUORUM will give consistent results. With background traffic > the average difference in timestamps was 0.6s, and the maximum was >3.5s. > This means that a client sees a version of the row, and can subsequently see > another version of the row that is 3.5s older than the previous. > > What I imagine is happening is this, but I'd like someone who knows that > they're talking about to tell me if it's actually the case: > > I think Cassandra is not using an atomic commit protocol to commit to the > quorum of servers chosen when the write is made. This means that at some > point in the middle of the write, some subset of the quorum have seen the > write, while others have not. At this time, there is a quorum of servers > that have not seen the update, so depending on which quorum the client reads > from, it may or may not see the update. > > Of course, I understand that the client is not *choosing* a bad quorum to > read from, it is just the first `q` servers to respond, but in this case it > is effectively random and sometimes an bad quorum is "chosen". > > Does anyone have any other insight into what is going on here?= > No virus found in this incoming message. > Checked by AVG - www.avg.com > Version: 9.0.894 / Virus Database: 271.1.1/3574 - Release Date: 04/15/11 > 02:34:00 > >