I am preparing to migrate a large amount of data to Cassandra. In order to test 
my migration code, I’ve been doing some dry runs to a test cluster. My test 
cluster is 2.0.15, 3 nodes, RF=1 and CL=QUORUM. I know RF=1 and CL=QUORUM is a 
weird combination, but my production cluster that will eventually receive this 
data is RF=3. I am running with RF=1 so its faster while I work out the kinks 
in the migration.

There are a few things that have puzzled me, after writing several 10’s of 
millions records to my test cluster.

My main concern is that I have a few tens of thousands of dropped mutation 
messages. I’m overloading my cluster. I never have more than about 10% CPU 
utilization (even my I/O wait is negligible). A curious thing about that is 
that the driver hasn’t thrown any exceptions, even though mutations have been 
dropped. I’ve seen dropped mutation messages on my production cluster, but like 
this, I’ve never gotten errors back from the client. I had always assumed that 
one node dropped mutation messages, but the other two did not, and so quorum 
was satisfied. With RF=1, I don’t understand how mutation messages are being 
dropped and the client doesn’t tell me about it. Does this mean my cluster is 
missing data, and I have no idea?

Each node has a couple dozen all-time blocked FlushWriters. Is that bad?

I have around 100 dropped counter mutations, which is very weird because I 
don’t write any counters. I have counters in my schema for tracking view 
counts, but the migration code doesn’t write them. How could I get dropped 
counter mutation messages when I don’t modify them?

Any insights would be appreciated. Thanks in advance.

Robert

Reply via email to