Re: Tombstones in memtable

Jeff Jirsa Sat, 23 Feb 2019 18:53:25 -0800

You’ll only ever have one tombstone per read, so your load is based on normal 
read rate not tombstones. The metric isn’t wrong, but it’s not indicative of a 
problem here given your data model.


You’re using STCS do you may be reading from more than one sstable if you 
update column2 for a given column1, otherwise you’re probably just seeing 
normal read load. Consider dropping your compression chunk size a bit (given 
the sizes in your cfstats I’d probably go to 4K instead of 64k), and maybe 
consider LCS or TWCS instead of STCS (Which is appropriate depends on a lot of 
factors, but STCS is probably causing a fair bit of unnecessary compactions and 
probably is very slow to expire data).

-- 
Jeff Jirsa


> On Feb 23, 2019, at 6:31 PM, Rahul Reddy <rahulreddy1...@gmail.com> wrote:
> 
> Do you see anything wrong with this metric.
> 
> metric to scan tombstones
> increase(cassandra_Table_TombstoneScannedHistogram{keyspace="mykeyspace",Table="tablename",function="Count"}[5m])
> 
> And sametime CPU Spike to 50% whenever I see high tombstone alert.
> 
>> On Sat, Feb 23, 2019, 9:25 PM Jeff Jirsa <jji...@gmail.com> wrote:
>> Your schema is such that you’ll never read more than one tombstone per 
>> select (unless you’re also doing range reads / table scans that you didn’t 
>> mention) - I’m not quite sure what you’re alerting on, but you’re not going 
>> to have tombstone problems with that table / that select. 
>> 
>> -- 
>> Jeff Jirsa
>> 
>> 
>>> On Feb 23, 2019, at 5:55 PM, Rahul Reddy <rahulreddy1...@gmail.com> wrote:
>>> 
>>> Changing gcgs didn't help
>>> 
>>> CREATE KEYSPACE ksname WITH replication = {'class': 
>>> 'NetworkTopologyStrategy', 'dc1': '3', 'dc2': '3'}  AND durable_writes = 
>>> true;
>>> 
>>> 
>>> ```CREATE TABLE keyspace."table" (
>>>     "column1" text PRIMARY KEY,
>>>     "column2" text
>>> ) WITH bloom_filter_fp_chance = 0.01
>>>     AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
>>>     AND comment = ''
>>>     AND compaction = {'class': 
>>> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 
>>> 'max_threshold': '32', 'min_threshold': '4'}
>>>     AND compression = {'chunk_length_in_kb': '64', 'class': 
>>> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>>>     AND crc_check_chance = 1.0
>>>     AND dclocal_read_repair_chance = 0.1
>>>     AND default_time_to_live = 18000
>>>     AND gc_grace_seconds = 60
>>>     AND max_index_interval = 2048
>>>     AND memtable_flush_period_in_ms = 0
>>>     AND min_index_interval = 128
>>>     AND read_repair_chance = 0.0
>>>     AND speculative_retry = '99PERCENTILE';
>>> 
>>> flushed table and took tsstabledump     
>>> grep -i '"expired" : true' SSTables.txt|wc -l
>>> 16439
>>> grep -i '"expired" : false'  SSTables.txt |wc -l
>>> 2657
>>> 
>>> ttl is 4 hours.
>>> 
>>> INSERT INTO keyspace."TABLE_NAME" ("column1", "column2") VALUES (?, ?) 
>>> USING TTL(4hours) ?';
>>> SELECT * FROM keyspace."TABLE_NAME" WHERE "column1" = ?';
>>> 
>>> metric to scan tombstones 
>>> increase(cassandra_Table_TombstoneScannedHistogram{keyspace="mykeyspace",Table="tablename",function="Count"}[5m])
>>> 
>>> during peak hours. we only have couple of hundred inserts and 5-8k reads/s 
>>> per node.
>>> ```
>>> 
>>> ```tablestats
>>>     Read Count: 605231874
>>>     Read Latency: 0.021268529760215503 ms.
>>>     Write Count: 2763352
>>>     Write Latency: 0.027924007871599422 ms.
>>>     Pending Flushes: 0
>>>             Table: name
>>>             SSTable count: 1
>>>             Space used (live): 1413203
>>>             Space used (total): 1413203
>>>             Space used by snapshots (total): 0
>>>             Off heap memory used (total): 28813
>>>             SSTable Compression Ratio: 0.5015090954531143
>>>             Number of partitions (estimate): 19568
>>>             Memtable cell count: 573
>>>             Memtable data size: 22971
>>>             Memtable off heap memory used: 0
>>>             Memtable switch count: 6
>>>             Local read count: 529868919
>>>             Local read latency: 0.020 ms
>>>             Local write count: 2707371
>>>             Local write latency: 0.024 ms
>>>             Pending flushes: 0
>>>             Percent repaired: 0.0
>>>             Bloom filter false positives: 1
>>>             Bloom filter false ratio: 0.00000
>>>             Bloom filter space used: 23888
>>>             Bloom filter off heap memory used: 23880
>>>             Index summary off heap memory used: 4717
>>>             Compression metadata off heap memory used: 216
>>>             Compacted partition minimum bytes: 73
>>>             Compacted partition maximum bytes: 124
>>>             Compacted partition mean bytes: 99
>>>             Average live cells per slice (last five minutes): 1.0
>>>             Maximum live cells per slice (last five minutes): 1
>>>             Average tombstones per slice (last five minutes): 1.0
>>>             Maximum tombstones per slice (last five minutes): 1
>>>             Dropped Mutations: 0
>>>             
>>>             histograms
>>> Percentile  SSTables     Write Latency      Read Latency    Partition Size  
>>>       Cell Count
>>>                               (micros)          (micros)           (bytes)  
>>>                 
>>> 50%             0.00             20.50             17.08                86  
>>>                1
>>> 75%             0.00             24.60             20.50               124  
>>>                1
>>> 95%             0.00             35.43             29.52               124  
>>>                1
>>> 98%             0.00             35.43             42.51               124  
>>>                1
>>> 99%             0.00             42.51             51.01               124  
>>>                1
>>> Min             0.00              8.24              5.72                73  
>>>                0
>>> Max             1.00             42.51            152.32               124  
>>>                1
>>> ```
>>> 
>>> 3 node in dc1 and 3 node in dc2 cluster. With instanc type aws  ec2 
>>> m4.xlarge
>>> 
>>>> On Sat, Feb 23, 2019, 7:47 PM Jeff Jirsa <jji...@gmail.com> wrote:
>>>> Would also be good to see your schema (anonymized if needed) and the 
>>>> select queries you’re running
>>>> 
>>>> 
>>>> -- 
>>>> Jeff Jirsa
>>>> 
>>>> 
>>>>> On Feb 23, 2019, at 4:37 PM, Rahul Reddy <rahulreddy1...@gmail.com> wrote:
>>>>> 
>>>>> Thanks Jeff,
>>>>> 
>>>>> I'm having gcgs set to 10 mins and changed the table ttl also to 5  hours 
>>>>> compared to insert ttl to 4 hours .  Tracing on doesn't show any 
>>>>> tombstone scans for the reads.  And also log doesn't show tombstone scan 
>>>>> alerts. Has the reads are happening 5-8k reads per node during the peak 
>>>>> hours it shows 1M tombstone scans count per read. 
>>>>> 
>>>>>> On Fri, Feb 22, 2019, 11:46 AM Jeff Jirsa <jji...@gmail.com> wrote:
>>>>>> If all of your data is TTL’d and you never explicitly delete a cell 
>>>>>> without using s TTL, you can probably drop your GCGS to 1 hour (or less).
>>>>>> 
>>>>>> Which compaction strategy are you using? You need a way to clear out 
>>>>>> those tombstones. There exist tombstone compaction sub properties that 
>>>>>> can help encourage compaction to grab sstables just because they’re full 
>>>>>> of tombstones which will probably help you.
>>>>>> 
>>>>>> 
>>>>>> -- 
>>>>>> Jeff Jirsa
>>>>>> 
>>>>>> 
>>>>>>> On Feb 22, 2019, at 8:37 AM, Kenneth Brotman 
>>>>>>> <kenbrot...@yahoo.com.invalid> wrote:
>>>>>>> 
>>>>>>> Can we see the histogram?  Why wouldn’t you at times have that many 
>>>>>>> tombstones?  Makes sense.
>>>>>>> 
>>>>>>>  
>>>>>>> 
>>>>>>> Kenneth Brotman
>>>>>>> 
>>>>>>>  
>>>>>>> 
>>>>>>> From: Rahul Reddy [mailto:rahulreddy1...@gmail.com] 
>>>>>>> Sent: Thursday, February 21, 2019 7:06 AM
>>>>>>> To: user@cassandra.apache.org
>>>>>>> Subject: Tombstones in memtable
>>>>>>> 
>>>>>>>  
>>>>>>> 
>>>>>>> We have small table records are about 5k .
>>>>>>> 
>>>>>>> All the inserts comes as 4hr ttl and we have table level ttl 1 day and 
>>>>>>> gc grace seconds has 3 hours.  We do 5k reads a second during peak load 
>>>>>>> During the peak load seeing Alerts for tomstone scanned histogram 
>>>>>>> reaching million.
>>>>>>> 
>>>>>>> Cassandra version 3.11.1. Please let me know how can this tombstone 
>>>>>>> scan can be avoided in memtable

Re: Tombstones in memtable

Reply via email to