You can read the comments about this new feature here : https://issues.apache.org/jira/browse/CASSANDRA-6117
2013/12/27 Kais Ahmed <k...@neteck-fr.com> > This threshold is to prevent bad performance, you can increase the value > > > 2013/12/27 Sanjeeth Kumar <sanje...@exotel.in> > >> Thanks for the replies. >> I dont think this is just a warning , incorrectly logged as an error. >> Everytime there is a crash, this is the exact traceback I see in the logs. >> I just browsed through the code and the code throws >> a TombstoneOverwhelmingException exception in these situations and I did >> not see this being caught and handled some place. I might be wrong though. >> >> But I would also like to understand why this threshold value is important >> , so that I can set a right threshold. >> >> - Sanjeeth >> >> >> On Fri, Dec 27, 2013 at 11:33 AM, Edward Capriolo >> <edlinuxg...@gmail.com>wrote: >> >>> I do not think the feature is supposed to crash the server. It could be >>> that the message is the logs and the crash is not related to this message. >>> WARN might be a better logging level for any message, even though the first >>> threshold is WARN and the second is FAIL. ERROR is usually something more >>> dramatic. >>> >>> >>> On Wed, Dec 25, 2013 at 1:02 PM, Laing, Michael < >>> michael.la...@nytimes.com> wrote: >>> >>>> It's a feature: >>>> >>>> In the stock cassandra.yaml file for 2.03 see: >>>> >>>> # When executing a scan, within or across a partition, we need to keep >>>>> the >>>>> # tombstones seen in memory so we can return them to the coordinator, >>>>> which >>>>> # will use them to make sure other replicas also know about the >>>>> deleted rows. >>>>> # With workloads that generate a lot of tombstones, this can cause >>>>> performance >>>>> # problems and even exaust the server heap. >>>>> # ( >>>>> http://www.datastax.com/dev/blog/cassandra-anti-patterns-queues-and-queue-like-datasets >>>>> ) >>>>> # Adjust the thresholds here if you understand the dangers and want to >>>>> # scan more tombstones anyway. These thresholds may also be adjusted >>>>> at runtime >>>>> # using the StorageService mbean. >>>>> tombstone_warn_threshold: 1000 >>>>> tombstone_failure_threshold: 100000 >>>> >>>> >>>> You are hitting the failure threshold. >>>> >>>> ml >>>> >>>> >>>> On Wed, Dec 25, 2013 at 12:17 PM, Rahul Menon <ra...@apigee.com> wrote: >>>> >>>>> Sanjeeth, >>>>> >>>>> Looks like the error is being populated from the hintedhandoff, what >>>>> is the size of your hints cf? >>>>> >>>>> Thanks >>>>> Rahul >>>>> >>>>> >>>>> On Wed, Dec 25, 2013 at 8:54 PM, Sanjeeth Kumar <sanje...@exotel.in>wrote: >>>>> >>>>>> Hi all, >>>>>> One of my cassandra nodes crashes with the following exception >>>>>> periodically - >>>>>> ERROR [HintedHandoff:33] 2013-12-25 20:29:22,276 >>>>>> SliceQueryFilter.java (line 200) Scanned over 100000 tombstones; query >>>>>> aborted (see tombstone_fail_thr >>>>>> eshold) >>>>>> ERROR [HintedHandoff:33] 2013-12-25 20:29:22,278 CassandraDaemon.java >>>>>> (line 187) Exception in thread Thread[HintedHandoff:33,1,main] >>>>>> org.apache.cassandra.db.filter.TombstoneOverwhelmingException >>>>>> at >>>>>> org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:201) >>>>>> at >>>>>> org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:122) >>>>>> at >>>>>> org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:80) >>>>>> at >>>>>> org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:72) >>>>>> at >>>>>> org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:297) >>>>>> at >>>>>> org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:53) >>>>>> at >>>>>> org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1487) >>>>>> at >>>>>> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1306) >>>>>> at >>>>>> org.apache.cassandra.db.HintedHandOffManager.doDeliverHintsToEndpoint(HintedHandOffManager.java:351) >>>>>> at >>>>>> org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:309) >>>>>> at >>>>>> org.apache.cassandra.db.HintedHandOffManager.access$300(HintedHandOffManager.java:92) >>>>>> at >>>>>> org.apache.cassandra.db.HintedHandOffManager$4.run(HintedHandOffManager.java:530) >>>>>> at >>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>>>>> at >>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>>>>> at java.lang.Thread.run(Thread.java:744) >>>>>> >>>>>> Why does this happen? Does this relate to any incorrect config value? >>>>>> >>>>>> The Cassandra Version I'm running is >>>>>> ReleaseVersion: 2.0.3 >>>>>> >>>>>> - Sanjeeth >>>>>> >>>>>> >>>>> >>>> >>> >> >