We were chatting to Jon Haddena about a week ago about our tombstone issue 
using Cassandra 2.0.14
To Summarize

We have a 3 node cluster with replication-factor=3 and compaction = SizeTiered
We use 1 keyspace with 1 table
Each row have about 40 columns
Each row has a TTL of 10 seconds

We insert about 500 rows per second in a prepared batch** (about 3mb in network 
overhead)
We query the entire table once per second

**This is too enable consistent data, E.G batch in transactional, so we get all 
queried data from one insert and not a mix of 2 or more.


Seems every second we insert, the rows are never deleted by the TTL, or so we 
thought.
After some time we got this message on the query side


#######################################
ERROR [ReadStage:91] 2015-04-21 12:27:03,902 SliceQueryFilter.java (line 206) 
Scanned over 100000 tombstones in keyspace.table; query aborted (see 
tombstone_failure_threshold)
ERROR [ReadStage:91] 2015-04-21 12:27:03,931 CassandraDaemon.java (line 199) 
Exception in thread Thread[ReadStage:91,5,main]
java.lang.RuntimeException: 
org.apache.cassandra.db.filter.TombstoneOverwhelmingException
                at 
org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2008)
                at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
                at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
                at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.cassandra.db.filter.TombstoneOverwhelmingException
#######################################


So we know tombstones are infact being created.
Solution was to change the table schema and set gc_grace_seconds to run every 
60 seconds.
This worked for 20 seconds, then we saw this


#######################################
Read 500 live and 30000 tombstoned cells in keyspace.table (see 
tombstone_warn_threshold). 10000 columns was requested, slices=[-], 
delInfo={deletedAt=-9223372036854775808, localDeletion=2147483647}
#######################################

So every 20 seconds (500 inserts x 20 seconds = 10,000 tombstones)
So now we have the gc_grace_seconds set to 10 seoncds.
But its feels very wrong to have it at a low number, especially if we move to a 
larger cluster. This just wont fly.
What are we doing wrong?

We shouldn't increase the tombstone threshold as that is extremely dangerous.


Best Regards
Stephen Walsh






This email (including any attachments) is proprietary to Aspect Software, Inc. 
and may contain information that is confidential. If you have received this 
message in error, please do not read, copy or forward this message. Please 
notify the sender immediately, delete it from your system and destroy any 
copies. You may not further disclose or distribute this email or its 
attachments.

Reply via email to