Re: Reading too many tombstones

Sebastian Estevez Thu, 04 Jun 2015 13:32:44 -0700

Check out the compaction subproperties for tombstones.

http://docs.datastax.com/en/cql/3.1/cql/cql_reference/compactSubprop.html?scroll=compactSubprop__compactionSubpropertiesDTCS
On Jun 4, 2015 1:29 PM, "Aiman Parvaiz" <ai...@flipagram.com> wrote:


> Thanks Carlos for pointing me in that direction, I have some interesting
> findings to share. So in December last year there was a redesign of
> home_feed and it was migrated to a new CF. Initially all the data in
> home_feed had a TTL of 1 year but migrated data was inserted with TTL of
> 30days.
> Now on digging a bit deeper I found that home_feed still has data from Jan
> 2015 with ttl 1275094 (14 days).
>
> This data is for the same id from home_feed:
>  date                     | ttl(description)
> --------------------------+------------------
>  2015-04-03 21:22:58+0000 |           759791
>  2015-04-03 04:50:11+0000 |           412706
>  2015-03-30 22:18:58+0000 |           759791
>  2015-03-29 15:20:36+0000 |          1978689
>  2015-03-28 14:41:28+0000 |          1275116
>  2015-03-28 14:31:25+0000 |          1275116
>  2015-03-18 19:23:44+0000 |          2512936
>  2015-03-13 17:51:01+0000 |          1978689
>  2015-02-12 15:41:01+0000 |          1978689
>  2015-01-18 02:36:27+0000 |          1275094
>
>
> I am not sure what happened in that migration but I think that when trying
> to load data we are reading this old data(as feed queries a 1000/page to be
> displayed to the user) and in order to read this data we have to
> cross(read) lots of tombstones(newer data has TTL working correctly) and
> hence the error.
> I am not sure how much would date tier help us in this situation too. If
> anyone has any suggestions in how to handle this either on Systems or
> Developer level please pitch in.
>
> Thanks
>
> On Thu, Jun 4, 2015 at 11:47 AM, Carlos Rolo <r...@pythian.com> wrote:
>
>> The TTL data will only be removed after the gc_grace_seconds. So your
>> data with 30 days TTL will be still in Cassandra for 10 days more (40 in
>> total). Is your data being there for more than that? Otherwise it is
>> expected behaviour and probably you should do something on your data model
>> to avoid scanning tombstoned data.
>>
>> Regards,
>>
>> Carlos Juzarte Rolo
>> Cassandra Consultant
>>
>> Pythian - Love your data
>>
>> rolo@pythian | Twitter: cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo
>> <http://linkedin.com/in/carlosjuzarterolo>*
>> Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649
>> www.pythian.com
>>
>> On Thu, Jun 4, 2015 at 8:31 PM, Aiman Parvaiz <ai...@flipagram.com>
>> wrote:
>>
>>> yeah we don't update old data. One thing I am curious about is why are
>>> we running in to so many tombstones with compaction happening normally. Is
>>> compaction not removing tombstomes?
>>>
>>>
>>> On Thu, Jun 4, 2015 at 11:25 AM, Jonathan Haddad <j...@jonhaddad.com>
>>> wrote:
>>>
>>>> DateTiered is fantastic if you've got time series, TTLed data.  That
>>>> means no updates to old data.
>>>>
>>>> On Thu, Jun 4, 2015 at 10:58 AM Aiman Parvaiz <ai...@flipagram.com>
>>>> wrote:
>>>>
>>>>> Hi everyone,
>>>>> We are running a 10 node Cassandra 2.0.9 without vnode cluster. We are
>>>>> running in to a issue where we are reading too many tombstones and hence
>>>>> getting tons of WARN messages and some ERROR query aborted.
>>>>>
>>>>> cass-prod4 2015-06-04 14:38:34,307 WARN ReadStage:
>>>>> <https://logentries.com/app/9f95dbd4#>1998
>>>>> SliceQueryFilter.collectReducedColumns - Read 46 live and 1560 tombstoned
>>>>> cells in ABC.home_feed (see tombstone_warn_threshold). 100 columns was
>>>>> requested, slices= <https://logentries.com/app/9f95dbd4#>[-], delInfo=
>>>>> <https://logentries.com/app/9f95dbd4#>{deletedAt=
>>>>> <https://logentries.com/app/9f95dbd4#>-9223372036854775808,
>>>>> localDeletion= <https://logentries.com/app/9f95dbd4#>2147483647}
>>>>>
>>>>> cass-prod2 2015-05-31 12:55:55,331 ERROR ReadStage:
>>>>> <https://logentries.com/app/9f95dbd4#>1953
>>>>> SliceQueryFilter.collectReducedColumns - Scanned over 100000 tombstones in
>>>>> ABC.home_feed; query aborted (see tombstone_fail_threshold)
>>>>>
>>>>> As you can see all of this is happening for CF home_feed. This CF is
>>>>> basically maintaining a feed with TTL set to 2592000 (30 days).
>>>>> gc_grace_seconds for this CF is 864000 and its SizeTieredCompaction.
>>>>>
>>>>> Repairs have been running regularly and automatic compactions are
>>>>> occurring normally too.
>>>>>
>>>>> I can definitely use some help here in how to tackle this issue.
>>>>>
>>>>> Up till now I have the following ideas:
>>>>>
>>>>> 1) I can make gc_grace_seconds to 0 and then do a manual compaction
>>>>> for this CF and bump up the gc_grace again.
>>>>>
>>>>> 2) Make gc_grace 0, run manual compaction on this CF and leave
>>>>> gc_grace to zero. In this case have to be careful in running repairs.
>>>>>
>>>>> 3) I am also considering moving to DateTier Compaction.
>>>>>
>>>>> What would be the best approach here for my feed case. Any help is
>>>>> appreciated.
>>>>>
>>>>> Thanks
>>>>>
>>>>>
>>>
>>>
>>>
>>>
>>
>> --
>>
>>
>>
>>
>
>
> --
> Lead Systems Architect
> 10351 Santa Monica Blvd, Suite 3310
> Los Angeles CA 90025
>

Re: Reading too many tombstones

Reply via email to