Hello Ian "It sounds like this 100k limit is, indeed, a "global" limit as opposed to a per-row limit" -->The threshold applies to each "REQUEST", not partition or globally.
The threshold does not apply to a partition (physical row) simply because in one request you can fetch data from many partitions (multi get slice). There was a JIRA about this here: https://issues.apache.org/jira/browse/CASSANDRA-6865 "Are these tombstones ever "GCed" out of the index?" --> Yes they are, during compactions of the index column family. "How frequently?" --> That's the real pain. Indeed you do not have any control on the tuning of secondary index CF compaction. As far as I know, the compaction settings (strategy, min/max thresholds...) inherits from the one of the base table Now, by looking very fast into your data model, it seems that you have a skinny partition patter. Since you mentioned that the date is updated only 10 times max, you should not run into the tombstonne threshold issue. On a side node, your usage of secondary index is not the best one. Indeed, indexing the update date will lead to a situation where for one date, you'll mostly have one or a few matching items (assuming that the update date resolution is small enough and update rate is not intense). It is the high-cardinality scenario to be avoided ( http://www.datastax.com/documentation/cql/3.0/cql/ddl/ddl_when_use_index_c.html). Plus, the query on the index (find all items where last_updated < [now - 30 minutes]) makes things worse since it is not an exact match but inequality. You better off create a manuel reverse-index to track modification date, something like this: CREATE TABLE last_updated_item ( minute_bucket int, // format YYYYMMDDHHmm last_update_date timestamp, item_id ascii, PRIMARY KEY(minute_bucket, last_update_date) ); The last_update_date column is quite self-explanatory. The minute_bucket is trickier. The idea is to split ranges on 30 minutes into buckets. 00:00 to 00:30 is bucket 1, 00:30 to 01:00 is bucket 2 and so on. For a whole day, you'd have 48 buckets. We need to put data into buckets to avoid ultra wide rows since you mentioned that there are 10 items (so 10 updates) / sec. Of course, 30 mins is just an exemple, you can tune it down to a window of 5 minutes or 1 minute, depending on the insertion rate. On Sun, Aug 10, 2014 at 10:02 PM, Ian Rose <ianr...@fullstory.com> wrote: > Hi Mark - > > Thanks for the clarification but as I'm not too familiar with the nuts & > bolts of Cassandra I'm not sure how to apply that info to my current > situation. It sounds like this 100k limit is, indeed, a "global" limit as > opposed to a per-row limit. Are these tombstones ever "GCed" out of the > index? How frequently? If not, then it seems like *any* index is at risk > of reaching this tipping point; it's just that indexes on frequently > updated columns will reach this pointer faster the indexes on rarely > updated columns. > > Basically I'm trying to get some kind of sense for what "frequently > updated > <http://www.datastax.com/documentation/cql/3.0/cql/ddl/ddl_when_use_index_c.html>" > means quantitatively. As written, the docs make it sound dangerous to > create an index on a column that is *ever* deleted or updated since there > is no sense of how frequent is "too frequent". > > Cheers, > Ian > > > > > > > > On Sun, Aug 10, 2014 at 3:02 PM, Mark Reddy <mark.re...@boxever.com> > wrote: > >> Hi Ian, >> >> The issues here, which relates to normal and index column families, is >> scanning over a large number of tombstones can cause Cassandra to fall over >> due to increased GC pressure. This pressure is caused because tombstones >> will create DeletedColumn objects which consume heap. Also >> these DeletedColumn objects will have to be serialized and sent back to the >> coordinator, thus increasing your response times. Take for example a row >> that does deletes and you query it with a limit of 100. In a worst case >> scenario you could end up reading say 50k tombstones to reach the 100 >> 'live' column limit, all of which has to be put on heap and then sent over >> the wire to the coordinator. This would be considered a Cassandra >> anti-pattern.[1] >> >> With that in mind there was a debug warning added to 1.2 to inform the >> user when they were querying a row with 1000 tombstones [2]. Then in 2.0 >> the action was taken to drop requests reaching 100k tombstones[3] rather >> than just printing out a warning. This is a safety measure, as it is not >> advised to perform such a query and is a result of most people 'doing it >> wrong'. >> >> For those people who understand the risk of scanning over large numbers >> of tombstones there is a configuration option in the cassandra.yaml to >> increase this threshold, tombstone_failure_threshold.[4] >> >> >> Mark >> >> [1] >> http://www.datastax.com/dev/blog/cassandra-anti-patterns-queues-and-queue-like-datasets >> [2] https://issues.apache.org/jira/browse/CASSANDRA-6042 >> [3] https://issues.apache.org/jira/browse/CASSANDRA-6117 >> [4] >> https://github.com/jbellis/cassandra/blob/4ac18ae805d28d8f4cb44b42e2244bfa6d2875e1/conf/cassandra.yaml#L407-L417 >> >> >> >> On Sun, Aug 10, 2014 at 7:19 PM, Ian Rose <ianr...@fullstory.com> wrote: >> >>> Hi - >>> >>> On this page ( >>> http://www.datastax.com/documentation/cql/3.0/cql/ddl/ddl_when_use_index_c.html), >>> the docs state: >>> >>> Do not use an index [...] On a frequently updated or deleted column >>> >>> >>> and >>> >>> >>>> *Problems using an index on a frequently updated or deleted column*ΒΆ >>>> <http://www.datastax.com/documentation/cql/3.0/cql/ddl/ddl_when_use_index_c.html?scroll=concept_ds_sgh_yzz_zj__upDatIndx> >>> >>> Cassandra stores tombstones in the index until the tombstone limit >>>> reaches 100K cells. After exceeding the tombstone limit, the query that >>>> uses the indexed value will fail. >>> >>> >>> >>> I'm afraid I don't really understand this limit from its (brief) >>> description. I also saw this recent thread >>> <http://mail-archives.apache.org/mod_mbox/cassandra-user/201403.mbox/%3CCABNXB2Bf4aeoDVpMNOxJ_e7aDez2EuZswMJx=jWfb8=oyo4...@mail.gmail.com%3E> >>> but >>> I'm afraid it didn't help me much... >>> >>> >>> *SHORT VERSION* >>> >>> If I have tens or hundreds of thousands of rows in a keyspace, where >>> every row has an indexed column that is updated O(10) times during the >>> lifetime of each row, is that going to cause problems for me? If that 100k >>> limit is *per row* then I should be fine but if that 100k limit is *per >>> keyspace* then I'd definitely exceed it quickly. >>> >>> >>> *FULL EXPLANATION* >>> >>> In our system, items are created at a rate of ~10/sec. Each item is >>> updated ~10 times over the next few minutes (although in rare cases the >>> number of updates, and the duration, might be several times as long). Once >>> the last update is received for an item, we select it from Cassandra, >>> process the data, then delete the entire row. >>> >>> The tricky bit is that sometimes (maybe 30-40% of the time) we don't >>> actually know when the last update has been received so we use a timeout: >>> if an item hasn't been updated for 30 minutes, then we assume it is done >>> and should process it as before (select, then delete). So I am trying to >>> design a schema that will allow for efficient queries of the form "find me >>> all items that have not been updated in the past 30 minutes." We plan to >>> call this query once a minute. >>> >>> Here is my tentative schema: >>> >>> CREATE TABLE items ( >>> item_id ascii, >>> last_updated timestamp, >>> item_data list<blob>, >>> PRIMARY KEY (item_id) >>> ) >>> plus an index on last_updated. >>> >>> So updates to an existing item would just be "lookup by item_id, append >>> new data to item_data, and set last_updated to now". And queries to find >>> items that have timed out would use the index on last_updated: "find all >>> items where last_updated < [now - 30 minutes]". >>> >>> Assuming, that is, that the aforementioned 100k tombstone limit won't >>> bring this index crashing to a halt... >>> >>> Any clarification on this limit and/or suggestions on a better way to >>> model/implement this system would be greatly appreciated! >>> >>> Cheers, >>> Ian >>> >>> >> >