Re: Tombstones in memtable

Rahul Reddy Sat, 23 Feb 2019 20:07:31 -0800

Reads increase on all most all nodes same is the case with CPU. it's goes
high on all nodes


On Sat, Feb 23, 2019, 11:04 PM Kenneth Brotman <kenbrot...@yahoo.com.invalid>
wrote:

> When the CPU utilization spikes from 5-10% to 50%, how many nodes does it
> happen to at the same time?
>
>
>
> *From:* Rahul Reddy [mailto:rahulreddy1...@gmail.com]
> *Sent:* Saturday, February 23, 2019 7:26 PM
> *To:* user@cassandra.apache.org
> *Subject:* Re: Tombstones in memtable
>
>
>
> ```jvm setting
>
>
>
> -XX:+UseThreadPriorities
>
> -XX:ThreadPriorityPolicy=42
>
> -XX:+HeapDumpOnOutOfMemoryError
>
> -Xss256k
>
> -XX:StringTableSize=1000003
>
> -XX:+AlwaysPreTouch
>
> -XX:-UseBiasedLocking
>
> -XX:+UseTLAB
>
> -XX:+ResizeTLAB
>
> -XX:+UseNUMA
>
> -XX:+PerfDisableSharedMem
>
> -Djava.net.preferIPv4Stack=true
>
> -XX:+UseG1GC
>
> -XX:G1RSetUpdatingPauseTimePercent=5
>
> -XX:MaxGCPauseMillis=500
>
> -XX:+PrintGCDetails
>
> -XX:+PrintGCDateStamps
>
> -XX:+PrintHeapAtGC
>
> -XX:+PrintTenuringDistribution
>
> -XX:+PrintGCApplicationStoppedTime
>
> -XX:+PrintPromotionFailure
>
> -XX:+UseGCLogFileRotation
>
> -XX:NumberOfGCLogFiles=10
>
> -XX:GCLogFileSize=10M
>
>
>
> Total memory
>
> free
>
>              total       used       free     shared    buffers     cached
>
> Mem:      16434004   16125340     308664         60     172872    5565184
>
> -/+ buffers/cache:   10387284    6046720
>
> Swap:            0          0          0
>
>
>
> Heap settings in cassandra-env.sh
>
> MAX_HEAP_SIZE="8192M"
>
> HEAP_NEWSIZE="800M"
>
> ```
>
>
>
> On Sat, Feb 23, 2019, 10:15 PM Rahul Reddy <rahulreddy1...@gmail.com>
> wrote:
>
> Thanks Jeff,
>
>
>
> Since low writes and high reads most of the time data in memtables only.
> When I noticed intially issue no stables on disk everything in memtable
> only.
>
>
>
> On Sat, Feb 23, 2019, 10:01 PM Jeff Jirsa <jji...@gmail.com> wrote:
>
> Also given your short ttl and low write rate, you may want to think about
> how you can keep more in memory - this may mean larger memtable and high
> flush thresholds (reading from the memtable), or perhaps the partition
> cache (if you are likely to read the same key multiple times). You’ll also
> probably win some with basic perf and GC tuning, but can’t really do that
> via email. Cassandra-8150 has some pointers.
>
> --
>
> Jeff Jirsa
>
>
>
>
> On Feb 23, 2019, at 6:52 PM, Jeff Jirsa <jji...@gmail.com> wrote:
>
> You’ll only ever have one tombstone per read, so your load is based on
> normal read rate not tombstones. The metric isn’t wrong, but it’s not
> indicative of a problem here given your data model
>
>
>
> You’re using STCS do you may be reading from more than one sstable if you
> update column2 for a given column1, otherwise you’re probably just seeing
> normal read load. Consider dropping your compression chunk size a bit
> (given the sizes in your cfstats I’d probably go to 4K instead of 64k), and
> maybe consider LCS or TWCS instead of STCS (Which is appropriate depends on
> a lot of factors, but STCS is probably causing a fair bit of unnecessary
> compactions and probably is very slow to expire data).
>
> --
>
> Jeff Jirsa
>
>
>
>
> On Feb 23, 2019, at 6:31 PM, Rahul Reddy <rahulreddy1...@gmail.com> wrote:
>
> Do you see anything wrong with this metric.
>
>
>
> metric to scan tombstones
>
>
> increase(cassandra_Table_TombstoneScannedHistogram{keyspace="mykeyspace",Table="tablename",function="Count"}[5m])
>
>
>
> And sametime CPU Spike to 50% whenever I see high tombstone alert.
>
>
>
> On Sat, Feb 23, 2019, 9:25 PM Jeff Jirsa <jji...@gmail.com> wrote:
>
> Your schema is such that you’ll never read more than one tombstone per
> select (unless you’re also doing range reads / table scans that you didn’t
> mention) - I’m not quite sure what you’re alerting on, but you’re not going
> to have tombstone problems with that table / that select.
>
> --
>
> Jeff Jirsa
>
>
>
>
> On Feb 23, 2019, at 5:55 PM, Rahul Reddy <rahulreddy1...@gmail.com> wrote:
>
> Changing gcgs didn't help
>
>
>
> CREATE KEYSPACE ksname WITH replication = {'class':
> 'NetworkTopologyStrategy', 'dc1': '3', 'dc2': '3'}  AND durable_writes =
> true;
>
>
>
>
>
> ```CREATE TABLE keyspace."table" (
>
>     "column1" text PRIMARY KEY,
>
>     "column2" text
>
> ) WITH bloom_filter_fp_chance = 0.01
>
>     AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
>
>     AND comment = ''
>
>     AND compaction = {'class':
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
> 'max_threshold': '32', 'min_threshold': '4'}
>
>     AND compression = {'chunk_length_in_kb': '64', 'class':
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>
>     AND crc_check_chance = 1.0
>
>     AND dclocal_read_repair_chance = 0.1
>
>     AND default_time_to_live = 18000
>
>     AND gc_grace_seconds = 60
>
>     AND max_index_interval = 2048
>
>     AND memtable_flush_period_in_ms = 0
>
>     AND min_index_interval = 128
>
>     AND read_repair_chance = 0.0
>
>     AND speculative_retry = '99PERCENTILE';
>
>
>
> flushed table and took tsstabledump
>
> grep -i '"expired" : true' SSTables.txt|wc -l
>
> 16439
>
> grep -i '"expired" : false'  SSTables.txt |wc -l
>
> 2657
>
>
>
> ttl is 4 hours.
>
>
>
> INSERT INTO keyspace."TABLE_NAME" ("column1", "column2") VALUES (?, ?)
> USING TTL(4hours) ?';
>
> SELECT * FROM keyspace."TABLE_NAME" WHERE "column1" = ?';
>
>
>
> metric to scan tombstones
>
>
> increase(cassandra_Table_TombstoneScannedHistogram{keyspace="mykeyspace",Table="tablename",function="Count"}[5m])
>
>
>
> during peak hours. we only have couple of hundred inserts and 5-8k reads/s
> per node.
>
> ```
>
>
>
> ```tablestats
>
> Read Count: 605231874
>
> Read Latency: 0.021268529760215503 ms.
>
> Write Count: 2763352
>
> Write Latency: 0.027924007871599422 ms.
>
> Pending Flushes: 0
>
> Table: name
>
> SSTable count: 1
>
> Space used (live): 1413203
>
> Space used (total): 1413203
>
> Space used by snapshots (total): 0
>
> Off heap memory used (total): 28813
>
> SSTable Compression Ratio: 0.5015090954531143
>
> Number of partitions (estimate): 19568
>
> Memtable cell count: 573
>
> Memtable data size: 22971
>
> Memtable off heap memory used: 0
>
> Memtable switch count: 6
>
> Local read count: 529868919
>
> Local read latency: 0.020 ms
>
> Local write count: 2707371
>
> Local write latency: 0.024 ms
>
> Pending flushes: 0
>
> Percent repaired: 0.0
>
> Bloom filter false positives: 1
>
> Bloom filter false ratio: 0.00000
>
> Bloom filter space used: 23888
>
> Bloom filter off heap memory used: 23880
>
> Index summary off heap memory used: 4717
>
> Compression metadata off heap memory used: 216
>
> Compacted partition minimum bytes: 73
>
> Compacted partition maximum bytes: 124
>
> Compacted partition mean bytes: 99
>
> Average live cells per slice (last five minutes): 1.0
>
> Maximum live cells per slice (last five minutes): 1
>
> Average tombstones per slice (last five minutes): 1.0
>
> Maximum tombstones per slice (last five minutes): 1
>
> Dropped Mutations: 0
>
> histograms
>
> Percentile  SSTables     Write Latency      Read Latency    Partition
> Size        Cell Count
>
>                               (micros)          (micros)
>  (bytes)
>
> 50%             000             20.50             17.08                86
>                1
>
> 75%             0.00             24.60             20.50
>  124                 1
>
> 95%             0.00             35.43             29.52
>  124                 1
>
> 98%             0.00             35.43             42.51
>  124                 1
>
> 99%             0.00             42.51             51.01
>  124                 1
>
> Min             0.00              8.24              5.72
> 73                 0
>
> Max             1.00             42.51            152.32
>  124                 1
>
> ```
>
>
>
> 3 node in dc1 and 3 node in dc2 cluster. With instanc type aws  ec2
> m4.xlarge
>
>
>
> On Sat, Feb 23, 2019, 7:47 PM Jeff Jirsa <jji...@gmail.com> wrote:
>
> Would also be good to see your schema (anonymized if needed) and the
> select queries you’re running
>
>
>
> --
>
> Jeff Jirsa
>
>
>
>
> On Feb 23, 2019, at 4:37 PM, Rahul Reddy <rahulreddy1...@gmail.com> wrote:
>
> Thanks Jeff,
>
>
>
> I'm having gcgs set to 10 mins and changed the table ttl also to 5  hours
> compared to insert ttl to 4 hours .  Tracing on doesn't show any tombstone
> scans for the reads.  And also log doesn't show tombstone scan alerts. Has
> the reads are happening 5-8k reads per node during the peak hours it shows
> 1M tombstone scans count per read.
>
>
>
> On Fri, Feb 22, 2019, 11:46 AM Jeff Jirsa <jji...@gmail.com> wrote:
>
> If all of your data is TTL’d and you never explicitly delete a cell
> without using s TTL, you can probably drop your GCGS to 1 hour (or less).
>
>
>
> Which compaction strategy are you using? You need a way to clear out those
> tombstones. There exist tombstone compaction sub properties that can help
> encourage compaction to grab sstables just because they’re full of
> tombstones which will probably help you.
>
>
>
> --
>
> Jeff Jirsa
>
>
>
>
> On Feb 22, 2019, at 8:37 AM, Kenneth Brotman <kenbrot...@yahoo.com.invalid>
> wrote:
>
> Can we see the histogram?  Why wouldn’t you at times have that many
> tombstones?  Makes sense.
>
>
>
> Kenneth Brotman
>
>
>
> *From:* Rahul Reddy [mailto:rahulreddy1...@gmail.com
> <rahulreddy1...@gmail.com>]
> *Sent:* Thursday, February 21, 2019 7:06 AM
> *To:* user@cassandra.apache.org
> *Subject:* Tombstones in memtable
>
>
>
> We have small table records are about 5k .
>
> All the inserts comes as 4hr ttl and we have table level ttl 1 day and gc
> grace seconds has 3 hours.  We do 5k reads a second during peak load During
> the peak load seeing Alerts for tomstone scanned histogram reaching million.
>
> Cassandra version 3.11.1. Please let me know how can this tombstone scan
> can be avoided in memtable
>
>

Re: Tombstones in memtable

Reply via email to