Re: Avoiding High Cell Tombstone Count

Nate McCall Wed, 28 May 2014 08:45:11 -0700

You could turn gc_grace_seconds down to zero and tune compaction options
for this CF to keep the tombstone count down.


But...

This query looks a lot like a ledger. If that is so, treat it as such and
skip the updates by:
- modifying the schema to include a timeuuid as part of a compound key (and
using that timeuuid for order)
- select the most recent via limit 1
- would you even need paxos at this point? (please read Pat Helland's
Building on Quicksand:
http://blogs.msdn.com/cfs-file.ashx/__key/communityserver-components-postattachments/00-09-20-52-14/BuildingOnQuicksand_2D00_V3_2D00_081212h_2D00_pdf.pdfparticularly
section 6.2)
- use a TTL to keep the table tame if it's high volume

This 'immutable' approach plays much nicer with Cassandra's strong points.


On Sun, May 25, 2014 at 2:01 PM, Charlie Mason <charlie....@gmail.com>wrote:

> Hi All,
>
> I have a table which has one column per user. It revives at lot of updates
> to these columns through out the life time. They are always updates on a
> few specific columns Firstly is Cassandra storing a Tombstone for each of
> these old column values.
>
> I have run a simple select and seen the following tracing results:
>
> activity
>                | timestamp    | source    | source_elapsed
>
> -------------------------------------------------------------------------------------------+--------------+-----------+----------------
> execute_cql3_query | 19:48:36,582 | 127.0.0.1 |              0
> Parsing SELECT Account, Balance FROM AccountBalances WHERE Account =
> 'test9' LIMIT 10000; | 19:48:36,582 | 127.0.0.1 |             56
> Preparing statement | 19:48:36,582 | 127.0.0.1 |            181
> Executing single-partition query on accountbalances | 19:48:36,583 |
> 127.0.0.1 |            878
> Acquiring sstable references | 19:48:36,583 | 127.0.0.1 |            895
> Merging memtable tombstones | 19:48:36,583 | 127.0.0.1 |            918
> Key cache hit for sstable 569 | 19:48:36,583 | 127.0.0.1 |            997
> Seeking to partition beginning in data file | 19:48:36,583 | 127.0.0.1 |
>         1034
> Skipped 0/1 non-slice-intersecting sstables, included 0 due to tombstones
> | 19:48:36,583 | 127.0.0.1 |           1383
> Merging data from memtables and 1 sstables | 19:48:36,583 | 127.0.0.1 |
>         1402
> Read 1 live and 123780 tombstoned cells | 19:48:36,710 | 127.0.0.1 |
>   128631
> Request complete | 19:48:36,711 | 127.0.0.1 |         129276
>
>
> As you can see that's awful lot of tombstoned cells. That's after a full
> compaction as well. Just so you are aware this table is updated using a
> Paxos IF statement.
>
> Its still seems fairly snappy however I am concerned its only going to get
> worse.
>
> Would I better off adding a time based key to the primary key. Then doing
> a sepperate insert and then deleting the original. If I did the query with
> a limit of one it should always find the first rows before hitting a
> tombstone. Is that correct?
>
> Thanks,
>
> Charlie M
>
>


-- 
-----------------
Nate McCall
Austin, TX
@zznate

Co-Founder & Sr. Technical Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

Re: Avoiding High Cell Tombstone Count

Reply via email to