Here it is.  There is some setup code and global variable definitions that I 
left out of the previous code, but they are pretty similar to the setup code 
here.

    import pycassa
    import random
    import time

    consistency_level = pycassa.cassandra.ttypes.ConsistencyLevel.QUORUM
    duration = 600
    sleeptime = 0.0
    hostlist = 'worker-hostlist'

    def read_servers(fn):
        f = open(fn)
        servers = []
        for line in f:
            servers.append(line.strip())
        f.close()
        return servers

    servers = read_servers(hostlist)
    start_time = time.time()
    seqnum = -1
    timestamp = 0

    while time.time() < start_time + duration:
        target_server = random.sample(servers, 1)[0]
        target_server = '%s:9160'%target_server

        try:
            pool = pycassa.connect('Keyspace1', [target_server])
            cf = pycassa.ColumnFamily(pool, 'Standard1')
            row = cf.get('foo', read_consistency_level=consistency_level)
            pool.dispose()
        except:
            time.sleep(sleeptime)
            continue

        sq = int(row['seqnum'])
        ts = float(row['timestamp'])

        if sq < seqnum:
            print 'Row changed: %i %f -> %i %f'%(seqnum, timestamp, sq, ts)
        seqnum = sq
        timestamp = ts

        if sleeptime > 0.0:
            time.sleep(sleeptime)




On Apr 16, 2011, at 5:20 PM, Tyler Hobbs wrote:

> James,
> 
> Would you mind sharing your reader process code as well?
> 
> On Fri, Apr 15, 2011 at 1:14 PM, James Cipar <jci...@cmu.edu> wrote:
> I've been experimenting with the consistency model of Cassandra, and I found 
> something that seems a bit unexpected.  In my experiment, I have 2 processes, 
> a reader and a writer, each accessing a Cassandra cluster with a replication 
> factor greater than 1.  In addition, sometimes I generate background traffic 
> to simulate a busy cluster by uploading a large data file to another table.
> 
> The writer executes a loop where it writes a single row that contains just an 
> sequentially increasing sequence number and a timestamp.  In python this 
> looks something like:
> 
>    while time.time() < start_time + duration:
>        target_server = random.sample(servers, 1)[0]
>        target_server = '%s:9160'%target_server
> 
>        row = {'seqnum':str(seqnum), 'timestamp':str(time.time())}
>        seqnum += 1
>        # print 'uploading to server %s, %s'%(target_server, row)
> 
>        pool = pycassa.connect('Keyspace1', [target_server])
>        cf = pycassa.ColumnFamily(pool, 'Standard1')
>        cf.insert('foo', row, write_consistency_level=consistency_level)
>        pool.dispose()
> 
>        if sleeptime > 0.0:
>            time.sleep(sleeptime)
> 
> 
> The reader simply executes a loop reading this row and reporting whenever a 
> sequence number is *less* than the previous sequence number.  As expected, 
> with consistency_level=ConsistencyLevel.ONE there are many inconsistencies, 
> especially with a high replication factor.
> 
> What is unexpected is that I still detect inconsistencies when it is set at 
> ConsistencyLevel.QUORUM.  This is unexpected because the documentation seems 
> to imply that QUORUM will give consistent results.  With background traffic 
> the average difference in timestamps was 0.6s, and the maximum was >3.5s.  
> This means that a client sees a version of the row, and can subsequently see 
> another version of the row that is 3.5s older than the previous.
> 
> What I imagine is happening is this, but I'd like someone who knows that 
> they're talking about to tell me if it's actually the case:
> 
> I think Cassandra is not using an atomic commit protocol to commit to the 
> quorum of servers chosen when the write is made.  This means that at some 
> point in the middle of the write, some subset of the quorum have seen the 
> write, while others have not.  At this time, there is a quorum of servers 
> that have not seen the update, so depending on which quorum the client reads 
> from, it may or may not see the update.
> 
> Of course, I understand that the client is not *choosing* a bad quorum to 
> read from, it is just the first `q` servers to respond, but in this case it 
> is effectively random and sometimes an bad quorum is "chosen".
> 
> Does anyone have any other insight into what is going on here?
> 
> 
> 
> -- 
> Tyler Hobbs
> Software Engineer, DataStax
> Maintainer of the pycassa Cassandra Python client library
> 

Reply via email to