If you never delete except by ttl, and always write with the same ttl (or monotonically increasing), you can set gc_grace_seconds to 0.
That's what we do. There have been discussions on the list over the last few years re this topic. ml On Tue, Apr 21, 2015 at 11:14 AM, Walsh, Stephen <stephen.wa...@aspect.com> wrote: > We were chatting to Jon Haddena about a week ago about our tombstone > issue using Cassandra 2.0.14 > > To Summarize > > > > We have a 3 node cluster with replication-factor=3 and compaction = > SizeTiered > > We use 1 keyspace with 1 table > > Each row have about 40 columns > > Each row has a TTL of 10 seconds > > > > We insert about 500 rows per second in a prepared batch** (about 3mb in > network overhead) > > We query the entire table once per second > > > > **This is too enable consistent data, E.G batch in transactional, so we > get all queried data from one insert and not a mix of 2 or more. > > > > > > Seems every second we insert, the rows are never deleted by the TTL, or so > we thought. > > After some time we got this message on the query side > > > > > > ####################################### > > ERROR [ReadStage:91] 2015-04-21 12:27:03,902 SliceQueryFilter.java (line > 206) Scanned over 100000 tombstones in keyspace.table; query aborted (see > tombstone_failure_threshold) > > ERROR [ReadStage:91] 2015-04-21 12:27:03,931 CassandraDaemon.java (line > 199) Exception in thread Thread[ReadStage:91,5,main] > > java.lang.RuntimeException: > org.apache.cassandra.db.filter.TombstoneOverwhelmingException > > at > org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2008) > > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > > at java.lang.Thread.run(Thread.java:745) > > Caused by: org.apache.cassandra.db.filter.TombstoneOverwhelmingException > > ####################################### > > > > > > So we know tombstones are infact being created. > > Solution was to change the table schema and set gc_grace_seconds to run > every 60 seconds. > > This worked for 20 seconds, then we saw this > > > > > > ####################################### > > Read 500 live and 30000 tombstoned cells in keyspace.table (see > tombstone_warn_threshold). 10000 columns was requested, slices=[-], > delInfo={deletedAt=-9223372036854775808, localDeletion=2147483647} > > ####################################### > > > > So every 20 seconds (500 inserts x 20 seconds = 10,000 tombstones) > > So now we have the gc_grace_seconds set to 10 seoncds. > > But its feels very wrong to have it at a low number, especially if we move > to a larger cluster. This just wont fly. > > What are we doing wrong? > > > > We shouldn’t increase the tombstone threshold as that is extremely > dangerous. > > > > > > Best Regards > > Stephen Walsh > > > > > > > > > > > > > This email (including any attachments) is proprietary to Aspect Software, > Inc. and may contain information that is confidential. If you have received > this message in error, please do not read, copy or forward this message. > Please notify the sender immediately, delete it from your system and > destroy any copies. You may not further disclose or distribute this email > or its attachments. >