Reads increase on all most all nodes same is the case with CPU. it's goes high on all nodes
On Sat, Feb 23, 2019, 11:04 PM Kenneth Brotman <kenbrot...@yahoo.com.invalid> wrote: > When the CPU utilization spikes from 5-10% to 50%, how many nodes does it > happen to at the same time? > > > > *From:* Rahul Reddy [mailto:rahulreddy1...@gmail.com] > *Sent:* Saturday, February 23, 2019 7:26 PM > *To:* user@cassandra.apache.org > *Subject:* Re: Tombstones in memtable > > > > ```jvm setting > > > > -XX:+UseThreadPriorities > > -XX:ThreadPriorityPolicy=42 > > -XX:+HeapDumpOnOutOfMemoryError > > -Xss256k > > -XX:StringTableSize=1000003 > > -XX:+AlwaysPreTouch > > -XX:-UseBiasedLocking > > -XX:+UseTLAB > > -XX:+ResizeTLAB > > -XX:+UseNUMA > > -XX:+PerfDisableSharedMem > > -Djava.net.preferIPv4Stack=true > > -XX:+UseG1GC > > -XX:G1RSetUpdatingPauseTimePercent=5 > > -XX:MaxGCPauseMillis=500 > > -XX:+PrintGCDetails > > -XX:+PrintGCDateStamps > > -XX:+PrintHeapAtGC > > -XX:+PrintTenuringDistribution > > -XX:+PrintGCApplicationStoppedTime > > -XX:+PrintPromotionFailure > > -XX:+UseGCLogFileRotation > > -XX:NumberOfGCLogFiles=10 > > -XX:GCLogFileSize=10M > > > > Total memory > > free > > total used free shared buffers cached > > Mem: 16434004 16125340 308664 60 172872 5565184 > > -/+ buffers/cache: 10387284 6046720 > > Swap: 0 0 0 > > > > Heap settings in cassandra-env.sh > > MAX_HEAP_SIZE="8192M" > > HEAP_NEWSIZE="800M" > > ``` > > > > On Sat, Feb 23, 2019, 10:15 PM Rahul Reddy <rahulreddy1...@gmail.com> > wrote: > > Thanks Jeff, > > > > Since low writes and high reads most of the time data in memtables only. > When I noticed intially issue no stables on disk everything in memtable > only. > > > > On Sat, Feb 23, 2019, 10:01 PM Jeff Jirsa <jji...@gmail.com> wrote: > > Also given your short ttl and low write rate, you may want to think about > how you can keep more in memory - this may mean larger memtable and high > flush thresholds (reading from the memtable), or perhaps the partition > cache (if you are likely to read the same key multiple times). You’ll also > probably win some with basic perf and GC tuning, but can’t really do that > via email. Cassandra-8150 has some pointers. > > -- > > Jeff Jirsa > > > > > On Feb 23, 2019, at 6:52 PM, Jeff Jirsa <jji...@gmail.com> wrote: > > You’ll only ever have one tombstone per read, so your load is based on > normal read rate not tombstones. The metric isn’t wrong, but it’s not > indicative of a problem here given your data model > > > > You’re using STCS do you may be reading from more than one sstable if you > update column2 for a given column1, otherwise you’re probably just seeing > normal read load. Consider dropping your compression chunk size a bit > (given the sizes in your cfstats I’d probably go to 4K instead of 64k), and > maybe consider LCS or TWCS instead of STCS (Which is appropriate depends on > a lot of factors, but STCS is probably causing a fair bit of unnecessary > compactions and probably is very slow to expire data). > > -- > > Jeff Jirsa > > > > > On Feb 23, 2019, at 6:31 PM, Rahul Reddy <rahulreddy1...@gmail.com> wrote: > > Do you see anything wrong with this metric. > > > > metric to scan tombstones > > > increase(cassandra_Table_TombstoneScannedHistogram{keyspace="mykeyspace",Table="tablename",function="Count"}[5m]) > > > > And sametime CPU Spike to 50% whenever I see high tombstone alert. > > > > On Sat, Feb 23, 2019, 9:25 PM Jeff Jirsa <jji...@gmail.com> wrote: > > Your schema is such that you’ll never read more than one tombstone per > select (unless you’re also doing range reads / table scans that you didn’t > mention) - I’m not quite sure what you’re alerting on, but you’re not going > to have tombstone problems with that table / that select. > > -- > > Jeff Jirsa > > > > > On Feb 23, 2019, at 5:55 PM, Rahul Reddy <rahulreddy1...@gmail.com> wrote: > > Changing gcgs didn't help > > > > CREATE KEYSPACE ksname WITH replication = {'class': > 'NetworkTopologyStrategy', 'dc1': '3', 'dc2': '3'} AND durable_writes = > true; > > > > > > ```CREATE TABLE keyspace."table" ( > > "column1" text PRIMARY KEY, > > "column2" text > > ) WITH bloom_filter_fp_chance = 0.01 > > AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'} > > AND comment = '' > > AND compaction = {'class': > 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', > 'max_threshold': '32', 'min_threshold': '4'} > > AND compression = {'chunk_length_in_kb': '64', 'class': > 'org.apache.cassandra.io.compress.LZ4Compressor'} > > AND crc_check_chance = 1.0 > > AND dclocal_read_repair_chance = 0.1 > > AND default_time_to_live = 18000 > > AND gc_grace_seconds = 60 > > AND max_index_interval = 2048 > > AND memtable_flush_period_in_ms = 0 > > AND min_index_interval = 128 > > AND read_repair_chance = 0.0 > > AND speculative_retry = '99PERCENTILE'; > > > > flushed table and took tsstabledump > > grep -i '"expired" : true' SSTables.txt|wc -l > > 16439 > > grep -i '"expired" : false' SSTables.txt |wc -l > > 2657 > > > > ttl is 4 hours. > > > > INSERT INTO keyspace."TABLE_NAME" ("column1", "column2") VALUES (?, ?) > USING TTL(4hours) ?'; > > SELECT * FROM keyspace."TABLE_NAME" WHERE "column1" = ?'; > > > > metric to scan tombstones > > > increase(cassandra_Table_TombstoneScannedHistogram{keyspace="mykeyspace",Table="tablename",function="Count"}[5m]) > > > > during peak hours. we only have couple of hundred inserts and 5-8k reads/s > per node. > > ``` > > > > ```tablestats > > Read Count: 605231874 > > Read Latency: 0.021268529760215503 ms. > > Write Count: 2763352 > > Write Latency: 0.027924007871599422 ms. > > Pending Flushes: 0 > > Table: name > > SSTable count: 1 > > Space used (live): 1413203 > > Space used (total): 1413203 > > Space used by snapshots (total): 0 > > Off heap memory used (total): 28813 > > SSTable Compression Ratio: 0.5015090954531143 > > Number of partitions (estimate): 19568 > > Memtable cell count: 573 > > Memtable data size: 22971 > > Memtable off heap memory used: 0 > > Memtable switch count: 6 > > Local read count: 529868919 > > Local read latency: 0.020 ms > > Local write count: 2707371 > > Local write latency: 0.024 ms > > Pending flushes: 0 > > Percent repaired: 0.0 > > Bloom filter false positives: 1 > > Bloom filter false ratio: 0.00000 > > Bloom filter space used: 23888 > > Bloom filter off heap memory used: 23880 > > Index summary off heap memory used: 4717 > > Compression metadata off heap memory used: 216 > > Compacted partition minimum bytes: 73 > > Compacted partition maximum bytes: 124 > > Compacted partition mean bytes: 99 > > Average live cells per slice (last five minutes): 1.0 > > Maximum live cells per slice (last five minutes): 1 > > Average tombstones per slice (last five minutes): 1.0 > > Maximum tombstones per slice (last five minutes): 1 > > Dropped Mutations: 0 > > histograms > > Percentile SSTables Write Latency Read Latency Partition > Size Cell Count > > (micros) (micros) > (bytes) > > 50% 000 20.50 17.08 86 > 1 > > 75% 0.00 24.60 20.50 > 124 1 > > 95% 0.00 35.43 29.52 > 124 1 > > 98% 0.00 35.43 42.51 > 124 1 > > 99% 0.00 42.51 51.01 > 124 1 > > Min 0.00 8.24 5.72 > 73 0 > > Max 1.00 42.51 152.32 > 124 1 > > ``` > > > > 3 node in dc1 and 3 node in dc2 cluster. With instanc type aws ec2 > m4.xlarge > > > > On Sat, Feb 23, 2019, 7:47 PM Jeff Jirsa <jji...@gmail.com> wrote: > > Would also be good to see your schema (anonymized if needed) and the > select queries you’re running > > > > -- > > Jeff Jirsa > > > > > On Feb 23, 2019, at 4:37 PM, Rahul Reddy <rahulreddy1...@gmail.com> wrote: > > Thanks Jeff, > > > > I'm having gcgs set to 10 mins and changed the table ttl also to 5 hours > compared to insert ttl to 4 hours . Tracing on doesn't show any tombstone > scans for the reads. And also log doesn't show tombstone scan alerts. Has > the reads are happening 5-8k reads per node during the peak hours it shows > 1M tombstone scans count per read. > > > > On Fri, Feb 22, 2019, 11:46 AM Jeff Jirsa <jji...@gmail.com> wrote: > > If all of your data is TTL’d and you never explicitly delete a cell > without using s TTL, you can probably drop your GCGS to 1 hour (or less). > > > > Which compaction strategy are you using? You need a way to clear out those > tombstones. There exist tombstone compaction sub properties that can help > encourage compaction to grab sstables just because they’re full of > tombstones which will probably help you. > > > > -- > > Jeff Jirsa > > > > > On Feb 22, 2019, at 8:37 AM, Kenneth Brotman <kenbrot...@yahoo.com.invalid> > wrote: > > Can we see the histogram? Why wouldn’t you at times have that many > tombstones? Makes sense. > > > > Kenneth Brotman > > > > *From:* Rahul Reddy [mailto:rahulreddy1...@gmail.com > <rahulreddy1...@gmail.com>] > *Sent:* Thursday, February 21, 2019 7:06 AM > *To:* user@cassandra.apache.org > *Subject:* Tombstones in memtable > > > > We have small table records are about 5k . > > All the inserts comes as 4hr ttl and we have table level ttl 1 day and gc > grace seconds has 3 hours. We do 5k reads a second during peak load During > the peak load seeing Alerts for tomstone scanned histogram reaching million. > > Cassandra version 3.11.1. Please let me know how can this tombstone scan > can be avoided in memtable > >