I have a cluster that is running 3.11.4 ( was upgraded a while back from
2.1.16 ).  what I see is a steady rate of read repair which is about 10%
constantly on only this 1 table.  Repairs have been run (actually several
times).  The table does not have a lot of writes to it so after repair, or
even after a read repair I would expect it to be fine.  the reason i'm
having to dig into this so much is for the fact that under a much large
traffic load than their normal traffic, latencies are higher than the app
team wants

I mean this thing is tiny, it's a 12x12 cluster but this 1 table is like
1GB per node on disk.

the application team is doing reads at LOCAL_QUORUM and I can simulate this
on that cluster by running a query using quorum and/or local_quorum and in
the trace can see every time running the query it comes back with a
DigestMismatchException no matter how many times I run it. that record
hasn't been updated by the application for several months.

repairs are scheduled and run every 7 days via reaper, recently in the past
week this table has been repaired at least 3 times.  every time there are
mismatches and data streams back and forth but yet still a constant rate of
read repairs.

curious if anyone has any recommendations to look info further or have
experienced anything like this?

this node has been up for 24 hours.. this is the netstats for read repairs
Mode: NORMAL
Not sending any streams.
Read Repair Statistics:
Attempted: 7481
Mismatch (Blocking): 11425375
Mismatch (Background): 17
Pool Name                    Active   Pending      Completed   Dropped
Large messages                  n/a         0           1232         0
Small messages                  n/a         0      395903678         0
Gossip messages                 n/a         0         603746         0

example of the schema... some modifications have been made to reduce
read_reapair and speculative_retry while troubleshooting..
CREATE TABLE keyspace.table1 ( item bigint, price int, start_date
timestamp, end_date timestamp, created_date timestamp, cost decimal, list
decimal, item_id int, modified_date timestamp, status int, PRIMARY KEY
((item, price), start_date, end_date) ) WITH CLUSTERING ORDER BY (start_date
ASC, end_date ASC) AND read_repair_chance = 0.0 AND
dclocal_read_repair_chance = 0.0 AND gc_grace_seconds = 864000 AND
bloom_filter_fp_chance = 0.01 AND caching = { 'keys' : 'ALL',
'rows_per_partition' : 'NONE' } AND comment = '' AND compaction = { 'class'
: 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
'max_threshold' : 32, 'min_threshold' : 4 } AND compression = {
'chunk_length_in_kb' : 4, 'class' :
'org.apache.cassandra.io.compress.LZ4Compressor' } AND default_time_to_live
= 0 AND speculative_retry = 'NONE' AND min_index_interval = 128 AND
max_index_interval = 2048 AND crc_check_chance = 1.0 AND cdc = false AND
memtable_flush_period_in_ms = 0;

Reply via email to