we do have otc_coalescing_strategy, we did run into that long while back were we see better performance with this off. and most recently, disk_access_mode to mmap_index_only as we have a few clusters where we would experience a lot more disk IO causing high load, high cpu and so latencies were crazy high. setting this to mmap_index_only we've seen a lot better overall performance.
just haven't seen this constant rate of read repairs. On Wed, Oct 16, 2019 at 12:57 PM ZAIDI, ASAD <az1...@att.com> wrote: > Wondering if you’ve disabled otc_coalescing_strategy CASSANDRA-12676 > <https://issues.apache.org/jira/browse/CASSANDRA-12676> since you’ve > upgraded from 2.x? also if you found luck by increasing > native_transport_max_threads to address blocked NTRs (CASSANDRA-11363)? > > ~Asad > > > > > > > > *From:* Patrick Lee [mailto:patrickclee0...@gmail.com] > *Sent:* Wednesday, October 16, 2019 12:22 PM > *To:* user@cassandra.apache.org > *Subject:* Re: Constant blocking read repair for such a tiny table > > > > haven't really figured this out yet. it's not a big problem but it is > annoying for sure! the cluster was upgraded from 2.1.16 to 3.11.4. now my > only thing is i'm not sure if had this type of behavior before the > upgrade. i'm leaning toward a no based on my data but i'm just not 100% > sure. > > > > just 1 table, out of all the ones on the cluster has this behavior. repair > has been run few times via reaper. even did a nodetool compact on the > nodes (since this table is like 1GB per node..) . just don't see why there > would be any inconsistency that would trigger read repair. > > > > any insight you may have would be appreciated! the real thing that > started this digging into the cluster was during some stress test > application team complained about high latency (30ms at p98). this cluster > is oversized already for this use case with only 14GB of data per node, > there is more than enough ram so all the data is basically cached in ram. > the only thing that stands out is this crazy read repair. so this read > repair may not be my root issue but definitely shouldn't be happening like > this. > > > > the vm's.. > > 12 cores > > 82GB ram > > 1.2TB local ephemeral ssd's > > > > attached the info from 1 of the nodes. > > > > On Tue, Oct 15, 2019 at 2:36 PM Alain RODRIGUEZ <arodr...@gmail.com> > wrote: > > Hello Patrick, > > > > Still in trouble with this? I must admit I'm really puzzled by your issue. > I have no real idea of what's going on. Would you share with us the output > of: > > > > - nodetool status <keyspace> > > - nodetool describecluster > > - nodetool gossipinfo > > - nodetool tpstats > > > > Also you said the app is running for a long time, with no changes. What > about Cassandra? Any recent operations? > > > > I hope that with this information we might be able to understand better > and finally be able to help. > > > > ----------------------- > > Alain Rodriguez - al...@thelastpickle.com > > France / Spain > > > > The Last Pickle - Apache Cassandra Consulting > > http://www.thelastpickle.com > <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.thelastpickle.com&d=DwMFaQ&c=LFYZ-o9_HUMeMTSQicvjIg&r=FsmDztdsVuIKml8IDhdHdg&m=fVfIU9D3R0bW8yLjJ1FIqRU5_r1p-MRImGQGnYTbC08&s=mAnyac8IpTL1FtjLk1K-KLtsRU2iUY3gu6ekhYNzvwQ&e=> > > > > Le ven. 4 oct. 2019 à 00:25, Patrick Lee <patrickclee0...@gmail.com> a > écrit : > > this table was actually leveled compaction before, just changed it to size > tiered yesterday while researching this. > > > > On Thu, Oct 3, 2019 at 4:31 PM Patrick Lee <patrickclee0...@gmail.com> > wrote: > > its not really time series data. and it's not updated very often, it > would have some updates but pretty infrequent. this thing should be super > fast, on avg it's like 1 to 2ms p99 currently but if they double - triple > the traffic on that table latencies go upward to 20ms to 50ms.. the only > odd thing i see is just that there are constant read repairs that follow > the same traffic pattern on the reads, which shows constant writes on the > table (from the read repairs), which after read repair or just normal full > repairs (all full through reaper, never ran any incremental repair) i would > expect it to not have any mismatches. the other 5 tables they use on the > cluster can have the same level traffic all very simple select from table > by partition key which returns a single record > > > > On Thu, Oct 3, 2019 at 4:21 PM John Belliveau <belliveau.j...@gmail.com> > wrote: > > Hi Patrick, > > > > Is this time series data? If so, I have run into issues with repair on > time series data using the SizeTieredCompactionStrategy. I have had > better luck using the TimeWindowCompactionStrategy. > > > > John > > > > Sent from Mail > <https://urldefense.proofpoint.com/v2/url?u=https-3A__go.microsoft.com_fwlink_-3FLinkId-3D550986&d=DwMFaQ&c=LFYZ-o9_HUMeMTSQicvjIg&r=FsmDztdsVuIKml8IDhdHdg&m=fVfIU9D3R0bW8yLjJ1FIqRU5_r1p-MRImGQGnYTbC08&s=aGRvgHbrlgJNg9TcQ959QphH99zLmUSpMeAgJo-Ptx4&e=> > for Windows 10 > > > > *From: *Patrick Lee <patrickclee0...@gmail.com> > *Sent: *Thursday, October 3, 2019 5:14 PM > *To: *user@cassandra.apache.org > *Subject: *Constant blocking read repair for such a tiny table > > > > I have a cluster that is running 3.11.4 ( was upgraded a while back from > 2.1.16 ). what I see is a steady rate of read repair which is about 10% > constantly on only this 1 table. Repairs have been run (actually several > times). The table does not have a lot of writes to it so after repair, or > even after a read repair I would expect it to be fine. the reason i'm > having to dig into this so much is for the fact that under a much large > traffic load than their normal traffic, latencies are higher than the app > team wants > > > > I mean this thing is tiny, it's a 12x12 cluster but this 1 table is like > 1GB per node on disk. > > > > the application team is doing reads at LOCAL_QUORUM and I can simulate > this on that cluster by running a query using quorum and/or local_quorum > and in the trace can see every time running the query it comes back with a > DigestMismatchException no matter how many times I run it. that record > hasn't been updated by the application for several months. > > > > repairs are scheduled and run every 7 days via reaper, recently in the > past week this table has been repaired at least 3 times. every time there > are mismatches and data streams back and forth but yet still a constant > rate of read repairs. > > > > curious if anyone has any recommendations to look info further or have > experienced anything like this? > > > > this node has been up for 24 hours.. this is the netstats for read repairs > > Mode: NORMAL > Not sending any streams. > Read Repair Statistics: > Attempted: 7481 > Mismatch (Blocking): 11425375 > Mismatch (Background): 17 > Pool Name Active Pending Completed Dropped > Large messages n/a 0 1232 0 > Small messages n/a 0 395903678 0 > Gossip messages n/a 0 603746 0 > > > > example of the schema... some modifications have been made to reduce > read_reapair and speculative_retry while troubleshooting.. > > CREATE TABLE keyspace.table1 ( > > item bigint, > > price int, > > start_date timestamp, > > end_date timestamp, > > created_date timestamp, > > cost decimal, > > list decimal, > > item_id int, > > modified_date timestamp, > > status int, > > PRIMARY KEY ((item, price), start_date, end_date) > > ) WITH CLUSTERING ORDER BY (start_date ASC, end_date ASC) > > AND read_repair_chance = 0.0 > > AND dclocal_read_repair_chance = 0.0 > > AND gc_grace_seconds = 864000 > > AND bloom_filter_fp_chance = 0.01 > > AND caching = { 'keys' : 'ALL', 'rows_per_partition' : 'NONE' } > > AND comment = '' > > AND compaction = { 'class' : > 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', > 'max_threshold' : 32, 'min_threshold' : 4 } > > AND compression = { 'chunk_length_in_kb' : 4, 'class' : > 'org.apache.cassandra.io.compress.LZ4Compressor' } > > AND default_time_to_live = 0 > > AND speculative_retry = 'NONE' > > AND min_index_interval = 128 > > AND max_index_interval = 2048 > > AND crc_check_chance = 1.0 > > AND cdc = false > > AND memtable_flush_period_in_ms = 0; > > > >