Hi, I'm trying to figure out what's going on with some column removes that don't seem to be taking hold.
This particular test is being done on a single node cluster running 0.6.8 with CL=QUORUM on the writes (which shouldn't matter, I'd think). What I'm seeing in our client log files is that a column remove is being done on a particular rowkey but when I later use cassandra-cli to query that rowkey the column shows up again. We're using batch_mutate with a list of about 500 modifications in the batch. The timestamp on the column is 6 months ago and the timestamp we're setting on the mutation is the current time (in microseconds). I've verified that I'm not inserting the same column back into the rowkey after it's deleted. We're using hector on the client side. I'm seeing no exceptions anywhere during the rather long run (about 2 hours of thumping the server pretty thoroughly...2.5GB of log files). This dropping of the delete appears to be very rare as we're doing tens of thousands of them and it looks like only a small number aren't working. I looked at the server logs and the only thing of interest there is that a few minutes after the supposed delete the server switches in a new memtable. Here's the log: 2011-01-27 07:45:28,646 INFO [org.apache.cassandra.db.ColumnFamilyStore] - View has reached its threshold; switching in a fresh Memtable at CommitLogContext(file='/dblogs/core/commitlog/CommitLog-1296142416683.log', position=43038896) 2011-01-27 07:45:28,646 INFO [org.apache.cassandra.db.ColumnFamilyStore] - Enqueuing flush of Memtable-View@1961066949(67679287 bytes, 248 operations) 2011-01-27 07:45:28,646 INFO [org.apache.cassandra.db.Memtable] - Writing Memtable-View@1961066949(67679287 bytes, 248 operations) 2011-01-27 07:45:29,351 INFO [org.apache.cassandra.db.Memtable] - Completed flushing /opt/cassandra/storage/core/data/Panama/View-14593-Data.db (1572707 bytes) 2011-01-27 07:46:42,667 INFO [org.apache.cassandra.db.ColumnFamilyStore] - CFTest1 has reached its threshold; switching in a fresh Memtable at CommitLogContext(file='/dblogs/core/commitlog/CommitLog-1296142416683.log', position=47447132) 2011-01-27 07:46:42,668 INFO [org.apache.cassandra.db.ColumnFamilyStore] - Enqueuing flush of Memtable-CFTest1@277830944(38486 bytes, 501 operations) 2011-01-27 07:46:42,668 INFO [org.apache.cassandra.db.Memtable] - Writing Memtable-CFTest1@277830944(38486 bytes, 501 operations) 2011-01-27 07:46:43,073 INFO [org.apache.cassandra.db.Memtable] - Completed flushing /opt/cassandra/storage/core/data/Panama/CFTest1-1258-Data.db (45750 bytes) [snip] This is about 5 minutes after the delete. I definitely can't say that it's not a client-side failure where that exception is getting silently swallowed. My concern is that the delete request makes it to the server but somehow gets lost (on a one node cluster...). Does anyone know if there's a known issue related to losing delete requests? Thanks, Scott