Also given your short ttl and low write rate, you may want to think about how 
you can keep more in memory - this may mean larger memtable and high flush 
thresholds (reading from the memtable), or perhaps the partition cache (if you 
are likely to read the same key multiple times). You’ll also probably win some 
with basic perf and GC tuning, but can’t really do that via email. 
Cassandra-8150 has some pointers. 
> You’ll only ever have one tombstone per read, so your load is based on normal 
> read rate not tombstones. The metric isn’t wrong, but it’s not indicative of 
> a problem here given your data model. 
> You’re using STCS do you may be reading from more than one sstable if you 
> update column2 for a given column1, otherwise you’re probably just seeing 
> normal read load. Consider dropping your compression chunk size a bit (given 
> the sizes in your cfstats I’d probably go to 4K instead of 64k), and maybe 
> consider LCS or TWCS instead of STCS (Which is appropriate depends on a lot 
> of factors, but STCS is probably causing a fair bit of unnecessary 
> compactions and probably is very slow to expire data).
>> Do you see anything wrong with this metric.
>> metric to scan tombstones
>> increase(cassandra_Table_TombstoneScannedHistogram{keyspace="mykeyspace",Table="tablename",function="Count"}[5m])
>> And sametime CPU Spike to 50% whenever I see high tombstone alert.
>>> Your schema is such that you’ll never read more than one tombstone per 
>>> select (unless you’re also doing range reads / table scans that you didn’t 
>>> mention) - I’m not quite sure what you’re alerting on, but you’re not going 
>>> to have tombstone problems with that table / that select. 
>>>> Changing gcgs didn't help
>>>> CREATE KEYSPACE ksname WITH replication = {'class': 
>>>> 'NetworkTopologyStrategy', 'dc1': '3', 'dc2': '3'}  AND durable_writes = 
>>>> true;
>>>> ```CREATE TABLE keyspace."table" (
>>>>     "column1" text PRIMARY KEY,
>>>>     "column2" text
>>>> ) WITH bloom_filter_fp_chance = 0.01
>>>>     AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
>>>>     AND comment = ''
>>>>     AND compaction = {'class': 
>>>> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 
>>>> 'max_threshold': '32', 'min_threshold': '4'}
>>>>     AND compression = {'chunk_length_in_kb': '64', 'class': 
>>>> ''}
>>>>     AND crc_check_chance = 1.0
>>>>     AND dclocal_read_repair_chance = 0.1
>>>>     AND default_time_to_live = 18000
>>>>     AND gc_grace_seconds = 60
>>>>     AND max_index_interval = 2048
>>>>     AND memtable_flush_period_in_ms = 0
>>>>     AND min_index_interval = 128
>>>>     AND read_repair_chance = 0.0
>>>>     AND speculative_retry = '99PERCENTILE';
>>>> flushed table and took tsstabledump     
>>>> grep -i '"expired" : true' SSTables.txt|wc -l
>>>> 16439
>>>> grep -i '"expired" : false'  SSTables.txt |wc -l
>>>> 2657
>>>> ttl is 4 hours.
>>>> INSERT INTO keyspace."TABLE_NAME" ("column1", "column2") VALUES (?, ?) 
>>>> USING TTL(4hours) ?';
>>>> SELECT * FROM keyspace."TABLE_NAME" WHERE "column1" = ?';
>>>> metric to scan tombstones 
>>>> increase(cassandra_Table_TombstoneScannedHistogram{keyspace="mykeyspace",Table="tablename",function="Count"}[5m])
>>>> during peak hours. we only have couple of hundred inserts and 5-8k reads/s 
>>>> per node.
>>>> ```
>>>> ```tablestats
>>>>    Read Count: 605231874
>>>>    Read Latency: 0.021268529760215503 ms.
>>>>    Write Count: 2763352
>>>>    Write Latency: 0.027924007871599422 ms.
>>>>    Pending Flushes: 0
>>>>            Table: name
>>>>            SSTable count: 1
>>>>            Space used (live): 1413203
>>>>            Space used (total): 1413203
>>>>            Space used by snapshots (total): 0
>>>>            Off heap memory used (total): 28813
>>>>            SSTable Compression Ratio: 0.5015090954531143
>>>>            Number of partitions (estimate): 19568
>>>>            Memtable cell count: 573
>>>>            Memtable data size: 22971
>>>>            Memtable off heap memory used: 0
>>>>            Memtable switch count: 6
>>>>            Local read count: 529868919
>>>>            Local read latency: 0.020 ms
>>>>            Local write count: 2707371
>>>>            Local write latency: 0.024 ms
>>>>            Pending flushes: 0
>>>>            Percent repaired: 0.0
>>>>            Bloom filter false positives: 1
>>>>            Bloom filter false ratio: 0.00000
>>>>            Bloom filter space used: 23888
>>>>            Bloom filter off heap memory used: 23880
>>>>            Index summary off heap memory used: 4717
>>>>            Compression metadata off heap memory used: 216
>>>>            Compacted partition minimum bytes: 73
>>>>            Compacted partition maximum bytes: 124
>>>>            Compacted partition mean bytes: 99
>>>>            Average live cells per slice (last five minutes): 1.0
>>>>            Maximum live cells per slice (last five minutes): 1
>>>>            Average tombstones per slice (last five minutes): 1.0
>>>>            Maximum tombstones per slice (last five minutes): 1
>>>>            Dropped Mutations: 0
>>>>            histograms
>>>> Percentile  SSTables     Write Latency      Read Latency    Partition Size 
>>>>        Cell Count
>>>>                               (micros)          (micros)           (bytes) 
>>>> 50%             0.00             20.50             17.08                86 
>>>>                 1
>>>> 75%             0.00             24.60             20.50               124 
>>>>                 1
>>>> 95%             0.00             35.43             29.52               124 
>>>>                 1
>>>> 98%             0.00             35.43             42.51               124 
>>>>                 1
>>>> 99%             0.00             42.51             51.01               124 
>>>>                 1
>>>> Min             0.00              8.24              5.72                73 
>>>>                 0
>>>> Max             1.00             42.51            152.32               124 
>>>>                 1
>>>> ```
>>>> 3 node in dc1 and 3 node in dc2 cluster. With instanc type aws  ec2 
>>>> m4.xlarge
>>>>> Would also be good to see your schema (anonymized if needed) and the 
>>>>> select queries you’re running
>>>>>> Thanks Jeff,
>>>>>> I'm having gcgs set to 10 mins and changed the table ttl also to 5  
>>>>>> hours compared to insert ttl to 4 hours .  Tracing on doesn't show any 
>>>>>> tombstone scans for the reads.  And also log doesn't show tombstone scan 
>>>>>> alerts. Has the reads are happening 5-8k reads per node during the peak 
>>>>>> hours it shows 1M tombstone scans count per read. 
>>>>>>> If all of your data is TTL’d and you never explicitly delete a cell 
>>>>>>> without using s TTL, you can probably drop your GCGS to 1 hour (or 
>>>>>>> less).
>>>>>>> Which compaction strategy are you using? You need a way to clear out 
>>>>>>> those tombstones. There exist tombstone compaction sub properties that 
>>>>>>> can help encourage compaction to grab sstables just because they’re 
>>>>>>> full of tombstones which will probably help you.
>>>>>>>> Can we see the histogram?  Why wouldn’t you at times have that many 
>>>>>>>> tombstones?  Makes sense.
>>>>>>>> Kenneth Brotman
>>>>>>>> We have small table records are about 5k .
>>>>>>>> All the inserts comes as 4hr ttl and we have table level ttl 1 day and 
>>>>>>>> gc grace seconds has 3 hours.  We do 5k reads a second during peak 
>>>>>>>> load During the peak load seeing Alerts for tomstone scanned histogram 
>>>>>>>> reaching million.
>>>>>>>> Cassandra version 3.11.1. Please let me know how can this tombstone 
>>>>>>>> scan can be avoided in memtable

