Hi,

My unit tests started failing once I upgraded from a single node cassandra
cluster to a full "N" node cluster (I'm starting with 4).  I had a few
various bugs, mostly due to forgetting to read/write at a quorum level in
places I needed stronger consistency guarantees.  But, I kept getting
random, intermittent failure (the worst kind).  I'm 99% sure I see why,
after some painful debugging, but I don't know what to do about it.  The
basic flaw in my understanding of cassandra seems to boil down to: I thought
system mutations of keyspaces/column families where of a stronger
consistency than ONE, but that appears to not be true.  Any way for me to
update a cluster at something more like QUORUM?

The basic idea is in my unit test.setup() I clone my real keyspace as
keyspace_UUID (with all of the exact same CFs) to get a fresh space to play
in.  In a single node environment, no issues.  But, in a cluster, it seems
that it takes a while for the system_add_keyspace call to propagate.  No
worries I think, I just modify my setup() to do
describe_keyspace(keyspace_UUID) in a while loop until the cluster is
ready.  My random failures drop considerably, but every once and awhile I
see a similar kind of failure.  Then I find out that schema updates seem to
propagate on a per node basis.  At least, that's what I have to assume as
I'm using phpcassa which uses a connection pool, and I see in my logging
that my setup() succeeds because one connection in the pool sees the new
keyspace, but when my tests run I grab a connection from the pool that is
missing it!

Do I have a solution other than changing my setup yet again to loop over all
cassandra servers doing a describe_keyspace()?

-- 
Will Oberman
Civic Science, Inc.
3030 Penn Avenue., First Floor
Pittsburgh, PA 15201
(M) 412-480-7835
(E) ober...@civicscience.com

Reply via email to