There are two types of read repair - Blocking/foreground due to reading you consistency level (local_quorum for you) and one is the responses not matching
- Probabilistic read repair which queries extra hosts in advance and read repairs them if they mismatch AFTER responding to the caller/client You’ve disabled the latter but you can’t disable the former (there’s a proposal to configure that but I don’t recall if it’s been committed and I’m mobile so not gonna go search JIRA). The big mutation is due to large mismatch - probably due to the bounces and reading before hints replayed (hint throttle is quite low in 3.11, you may want to increase it). -- Jeff Jirsa > On Jan 1, 2019, at 11:51 AM, Vlad <qa23d-...@yahoo.com.invalid> wrote: > > Hi, thanks for answer. > > what I don't understand is: > > - why there are attempts of read repair if repair chances are 0.0 ? > - what can be cause for big mutation size? > - why hinted handoffs didn't prevent inconsistency? (because of big mutation > size?) > > Thanks. > > > On Tuesday, January 1, 2019 9:41 PM, Jeff Jirsa <jji...@gmail.com> wrote: > > > Read repair due to digest mismatch and speculative retry can both cause some > behaviors that are hard to reason about (usually seen if a host stops > accepting writes due to bad disk, which you havent described, but generally > speaking, there are times when reads will block on writing to extra > replicas). > > The patch from https://issues.apache.org/jira/browse/CASSANDRA-10726 changes > this behavior significantly. > > The last message in this thread (about huge read repair mutations) suggests > that your writes during the bounce got some partitions quite out of sync, and > hints aren't replaying fast enough to fill in the gaps before you read, and > the read repair is timing out. The read repair timing out wouldn't block the > read after 10726, so if you're seeing read timeouts right now, what you > probably want to do is run repair or read much smaller pages so that read > repair succeeds, or increase your commitlog segment size from 32M to 128M or > so until the read repair actually succeeds. > > > On Tue, Jan 1, 2019 at 12:18 AM Vlad <qa23d-...@yahoo.com.invalid> wrote: > Hi All and Happy New Year!!! > > This year started with Cassandra 3.11.3 sometimes forces level ALL despite > query level LOCAL_QUORUM (actually there is only one DC) and it fails with > timeout. > > As far as I understand, it can be caused by read repair attempts (we see > "DigestMismatch" errors in Cassandra log), but table has no read repair > configured: > > AND bloom_filter_fp_chance = 0.01 > AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'} > AND comment = '' > AND compaction = {'class': > 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', > 'max_threshold': '32', 'min_threshold': '4'} > AND compression = {'chunk_length_in_kb': '64', 'class': > 'org.apache.cassandra.io.compress.LZ4Compressor'} > AND crc_check_chance = 1.0 > AND dclocal_read_repair_chance = 0.0 > AND default_time_to_live = 0 > AND gc_grace_seconds = 864000 > AND max_index_interval = 2048 > AND memtable_flush_period_in_ms = 0 > AND min_index_interval = 128 > AND read_repair_chance = 0.0 > AND speculative_retry = '99PERCENTILE'; > > > Any suggestions? > > Thanks. > >