Check out the compaction subproperties for tombstones. http://docs.datastax.com/en/cql/3.1/cql/cql_reference/compactSubprop.html?scroll=compactSubprop__compactionSubpropertiesDTCS On Jun 4, 2015 1:29 PM, "Aiman Parvaiz" <ai...@flipagram.com> wrote:
> Thanks Carlos for pointing me in that direction, I have some interesting > findings to share. So in December last year there was a redesign of > home_feed and it was migrated to a new CF. Initially all the data in > home_feed had a TTL of 1 year but migrated data was inserted with TTL of > 30days. > Now on digging a bit deeper I found that home_feed still has data from Jan > 2015 with ttl 1275094 (14 days). > > This data is for the same id from home_feed: > date | ttl(description) > --------------------------+------------------ > 2015-04-03 21:22:58+0000 | 759791 > 2015-04-03 04:50:11+0000 | 412706 > 2015-03-30 22:18:58+0000 | 759791 > 2015-03-29 15:20:36+0000 | 1978689 > 2015-03-28 14:41:28+0000 | 1275116 > 2015-03-28 14:31:25+0000 | 1275116 > 2015-03-18 19:23:44+0000 | 2512936 > 2015-03-13 17:51:01+0000 | 1978689 > 2015-02-12 15:41:01+0000 | 1978689 > 2015-01-18 02:36:27+0000 | 1275094 > > > I am not sure what happened in that migration but I think that when trying > to load data we are reading this old data(as feed queries a 1000/page to be > displayed to the user) and in order to read this data we have to > cross(read) lots of tombstones(newer data has TTL working correctly) and > hence the error. > I am not sure how much would date tier help us in this situation too. If > anyone has any suggestions in how to handle this either on Systems or > Developer level please pitch in. > > Thanks > > On Thu, Jun 4, 2015 at 11:47 AM, Carlos Rolo <r...@pythian.com> wrote: > >> The TTL data will only be removed after the gc_grace_seconds. So your >> data with 30 days TTL will be still in Cassandra for 10 days more (40 in >> total). Is your data being there for more than that? Otherwise it is >> expected behaviour and probably you should do something on your data model >> to avoid scanning tombstoned data. >> >> Regards, >> >> Carlos Juzarte Rolo >> Cassandra Consultant >> >> Pythian - Love your data >> >> rolo@pythian | Twitter: cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo >> <http://linkedin.com/in/carlosjuzarterolo>* >> Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649 >> www.pythian.com >> >> On Thu, Jun 4, 2015 at 8:31 PM, Aiman Parvaiz <ai...@flipagram.com> >> wrote: >> >>> yeah we don't update old data. One thing I am curious about is why are >>> we running in to so many tombstones with compaction happening normally. Is >>> compaction not removing tombstomes? >>> >>> >>> On Thu, Jun 4, 2015 at 11:25 AM, Jonathan Haddad <j...@jonhaddad.com> >>> wrote: >>> >>>> DateTiered is fantastic if you've got time series, TTLed data. That >>>> means no updates to old data. >>>> >>>> On Thu, Jun 4, 2015 at 10:58 AM Aiman Parvaiz <ai...@flipagram.com> >>>> wrote: >>>> >>>>> Hi everyone, >>>>> We are running a 10 node Cassandra 2.0.9 without vnode cluster. We are >>>>> running in to a issue where we are reading too many tombstones and hence >>>>> getting tons of WARN messages and some ERROR query aborted. >>>>> >>>>> cass-prod4 2015-06-04 14:38:34,307 WARN ReadStage: >>>>> <https://logentries.com/app/9f95dbd4#>1998 >>>>> SliceQueryFilter.collectReducedColumns - Read 46 live and 1560 tombstoned >>>>> cells in ABC.home_feed (see tombstone_warn_threshold). 100 columns was >>>>> requested, slices= <https://logentries.com/app/9f95dbd4#>[-], delInfo= >>>>> <https://logentries.com/app/9f95dbd4#>{deletedAt= >>>>> <https://logentries.com/app/9f95dbd4#>-9223372036854775808, >>>>> localDeletion= <https://logentries.com/app/9f95dbd4#>2147483647} >>>>> >>>>> cass-prod2 2015-05-31 12:55:55,331 ERROR ReadStage: >>>>> <https://logentries.com/app/9f95dbd4#>1953 >>>>> SliceQueryFilter.collectReducedColumns - Scanned over 100000 tombstones in >>>>> ABC.home_feed; query aborted (see tombstone_fail_threshold) >>>>> >>>>> As you can see all of this is happening for CF home_feed. This CF is >>>>> basically maintaining a feed with TTL set to 2592000 (30 days). >>>>> gc_grace_seconds for this CF is 864000 and its SizeTieredCompaction. >>>>> >>>>> Repairs have been running regularly and automatic compactions are >>>>> occurring normally too. >>>>> >>>>> I can definitely use some help here in how to tackle this issue. >>>>> >>>>> Up till now I have the following ideas: >>>>> >>>>> 1) I can make gc_grace_seconds to 0 and then do a manual compaction >>>>> for this CF and bump up the gc_grace again. >>>>> >>>>> 2) Make gc_grace 0, run manual compaction on this CF and leave >>>>> gc_grace to zero. In this case have to be careful in running repairs. >>>>> >>>>> 3) I am also considering moving to DateTier Compaction. >>>>> >>>>> What would be the best approach here for my feed case. Any help is >>>>> appreciated. >>>>> >>>>> Thanks >>>>> >>>>> >>> >>> >>> >>> >> >> -- >> >> >> >> > > > -- > Lead Systems Architect > 10351 Santa Monica Blvd, Suite 3310 > Los Angeles CA 90025 >