RE: Tombstones in memtable

Kenneth Brotman Sat, 23 Feb 2019 20:04:32 -0800

When the CPU utilization spikes from 5-10% to 50%, how many nodes does it 
happen to at the same time?

From: Rahul Reddy [mailto:[email protected]] 
Sent: Saturday, February 23, 2019 7:26 PM
To: [email protected]
Subject: Re: Tombstones in memtable

```jvm setting

-XX:+UseThreadPriorities

-XX:ThreadPriorityPolicy=42

-XX:+HeapDumpOnOutOfMemoryError

-Xss256k

-XX:StringTableSize=1000003

-XX:+AlwaysPreTouch

-XX:-UseBiasedLocking

-XX:+UseTLAB

-XX:+ResizeTLAB

-XX:+UseNUMA

-XX:+PerfDisableSharedMem

-Djava.net.preferIPv4Stack=true

-XX:+UseG1GC

-XX:G1RSetUpdatingPauseTimePercent=5

-XX:MaxGCPauseMillis=500

-XX:+PrintGCDetails

-XX:+PrintGCDateStamps

-XX:+PrintHeapAtGC

-XX:+PrintTenuringDistribution

-XX:+PrintGCApplicationStoppedTime

-XX:+PrintPromotionFailure

-XX:+UseGCLogFileRotation

-XX:NumberOfGCLogFiles=10

-XX:GCLogFileSize=10M

Total memory

free

             total       used       free     shared    buffers     cached

Mem:      16434004   16125340     308664         60     172872    5565184

-/+ buffers/cache:   10387284    6046720

Swap:            0          0          0

Heap settings in cassandra-env.sh

MAX_HEAP_SIZE="8192M"

HEAP_NEWSIZE="800M"

```

On Sat, Feb 23, 2019, 10:15 PM Rahul Reddy <[email protected]> wrote:

Thanks Jeff,

Since low writes and high reads most of the time data in memtables only.  When 
I noticed intially issue no stables on disk everything in memtable only. 

On Sat, Feb 23, 2019, 10:01 PM Jeff Jirsa <[email protected]> wrote:

Also given your short ttl and low write rate, you may want to think about how 
you can keep more in memory - this may mean larger memtable and high flush 
thresholds (reading from the memtable), or perhaps the partition cache (if you 
are likely to read the same key multiple times). You’ll also probably win some 
with basic perf and GC tuning, but can’t really do that via email. 
Cassandra-8150 has some pointers. 

-- 

Jeff Jirsa

On Feb 23, 2019, at 6:52 PM, Jeff Jirsa <[email protected]> wrote:

You’ll only ever have one tombstone per read, so your load is based on normal 
read rate not tombstones. The metric isn’t wrong, but it’s not indicative of a 
problem here given your data model 

You’re using STCS do you may be reading from more than one sstable if you 
update column2 for a given column1, otherwise you’re probably just seeing 
normal read load. Consider dropping your compression chunk size a bit (given 
the sizes in your cfstats I’d probably go to 4K instead of 64k), and maybe 
consider LCS or TWCS instead of STCS (Which is appropriate depends on a lot of 
factors, but STCS is probably causing a fair bit of unnecessary compactions and 
probably is very slow to expire data).

-- 

Jeff Jirsa

On Feb 23, 2019, at 6:31 PM, Rahul Reddy <[email protected]> wrote:

Do you see anything wrong with this metric.

metric to scan tombstones

increase(cassandra_Table_TombstoneScannedHistogram{keyspace="mykeyspace",Table="tablename",function="Count"}[5m])

And sametime CPU Spike to 50% whenever I see high tombstone alert.

On Sat, Feb 23, 2019, 9:25 PM Jeff Jirsa <[email protected]> wrote:

Your schema is such that you’ll never read more than one tombstone per select 
(unless you’re also doing range reads / table scans that you didn’t mention) - 
I’m not quite sure what you’re alerting on, but you’re not going to have 
tombstone problems with that table / that select. 

-- 

Jeff Jirsa

On Feb 23, 2019, at 5:55 PM, Rahul Reddy <[email protected]> wrote:

Changing gcgs didn't help

CREATE KEYSPACE ksname WITH replication = {'class': 'NetworkTopologyStrategy', 
'dc1': '3', 'dc2': '3'}  AND durable_writes = true;

```CREATE TABLE keyspace."table" (

    "column1" text PRIMARY KEY,

    "column2" text

) WITH bloom_filter_fp_chance = 0.01

    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}

    AND comment = ''

    AND compaction = {'class': 
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 
'max_threshold': '32', 'min_threshold': '4'}

    AND compression = {'chunk_length_in_kb': '64', 'class': 
'org.apache.cassandra.io.compress.LZ4Compressor'}

    AND crc_check_chance = 1.0

    AND dclocal_read_repair_chance = 0.1

    AND default_time_to_live = 18000

    AND gc_grace_seconds = 60

    AND max_index_interval = 2048

    AND memtable_flush_period_in_ms = 0

    AND min_index_interval = 128

    AND read_repair_chance = 0.0

    AND speculative_retry = '99PERCENTILE';

flushed table and took tsstabledump     

grep -i '"expired" : true' SSTables.txt|wc -l

16439

grep -i '"expired" : false'  SSTables.txt |wc -l

2657

ttl is 4 hours.

INSERT INTO keyspace."TABLE_NAME" ("column1", "column2") VALUES (?, ?) USING 
TTL(4hours) ?';

SELECT * FROM keyspace."TABLE_NAME" WHERE "column1" = ?';

metric to scan tombstones 

increase(cassandra_Table_TombstoneScannedHistogram{keyspace="mykeyspace",Table="tablename",function="Count"}[5m])

during peak hours. we only have couple of hundred inserts and 5-8k reads/s per 
node.

```

```tablestats

Read Count: 605231874

Read Latency: 0.021268529760215503 ms.

Write Count: 2763352

Write Latency: 0.027924007871599422 ms.

Pending Flushes: 0

Table: name

SSTable count: 1

Space used (live): 1413203

Space used (total): 1413203

Space used by snapshots (total): 0

Off heap memory used (total): 28813

SSTable Compression Ratio: 0.5015090954531143

Number of partitions (estimate): 19568

Memtable cell count: 573

Memtable data size: 22971

Memtable off heap memory used: 0

Memtable switch count: 6

Local read count: 529868919

Local read latency: 0.020 ms

Local write count: 2707371

Local write latency: 0.024 ms

Pending flushes: 0

Percent repaired: 0.0

Bloom filter false positives: 1

Bloom filter false ratio: 0.00000

Bloom filter space used: 23888

Bloom filter off heap memory used: 23880

Index summary off heap memory used: 4717

Compression metadata off heap memory used: 216

Compacted partition minimum bytes: 73

Compacted partition maximum bytes: 124

Compacted partition mean bytes: 99

Average live cells per slice (last five minutes): 1.0

Maximum live cells per slice (last five minutes): 1

Average tombstones per slice (last five minutes): 1.0

Maximum tombstones per slice (last five minutes): 1

Dropped Mutations: 0

histograms

Percentile  SSTables     Write Latency      Read Latency    Partition Size      
  Cell Count

                              (micros)          (micros)           (bytes)      

50%             000             20.50             17.08                86       
          1

75%             0.00             24.60             20.50               124      
           1

95%             0.00             35.43             29.52               124      
           1

98%             0.00             35.43             42.51               124      
           1

99%             0.00             42.51             51.01               124      
           1

Min             0.00              8.24              5.72                73      
           0

Max             1.00             42.51            152.32               124      
           1

```

3 node in dc1 and 3 node in dc2 cluster. With instanc type aws  ec2 m4.xlarge

On Sat, Feb 23, 2019, 7:47 PM Jeff Jirsa <[email protected]> wrote:

Would also be good to see your schema (anonymized if needed) and the select 
queries you’re running

-- 

Jeff Jirsa

On Feb 23, 2019, at 4:37 PM, Rahul Reddy <[email protected]> wrote:

Thanks Jeff,

I'm having gcgs set to 10 mins and changed the table ttl also to 5  hours 
compared to insert ttl to 4 hours .  Tracing on doesn't show any tombstone 
scans for the reads.  And also log doesn't show tombstone scan alerts. Has the 
reads are happening 5-8k reads per node during the peak hours it shows 1M 
tombstone scans count per read. 

On Fri, Feb 22, 2019, 11:46 AM Jeff Jirsa <[email protected]> wrote:

If all of your data is TTL’d and you never explicitly delete a cell without 
using s TTL, you can probably drop your GCGS to 1 hour (or less).

Which compaction strategy are you using? You need a way to clear out those 
tombstones. There exist tombstone compaction sub properties that can help 
encourage compaction to grab sstables just because they’re full of tombstones 
which will probably help you.

-- 

Jeff Jirsa

On Feb 22, 2019, at 8:37 AM, Kenneth Brotman <[email protected]> 
wrote:

Can we see the histogram?  Why wouldn’t you at times have that many tombstones? 
 Makes sense.

Kenneth Brotman

From: Rahul Reddy [mailto:[email protected]] 
Sent: Thursday, February 21, 2019 7:06 AM
To: [email protected]
Subject: Tombstones in memtable

We have small table records are about 5k .

All the inserts comes as 4hr ttl and we have table level ttl 1 day and gc grace 
seconds has 3 hours.  We do 5k reads a second during peak load During the peak 
load seeing Alerts for tomstone scanned histogram reaching million.

Cassandra version 3.11.1. Please let me know how can this tombstone scan can be 
avoided in memtable

RE: Tombstones in memtable

Reply via email to