There are two types of read repair

- Blocking/foreground due to reading you consistency level (local_quorum for 
you) and one is the responses not matching 

- Probabilistic read repair which queries extra hosts in advance and read 
repairs them if they mismatch AFTER responding to the caller/client

You’ve disabled the latter but you can’t disable the former (there’s a proposal 
to configure that but I don’t recall if it’s been committed and I’m mobile so 
not gonna go search JIRA).

The big mutation is due to large mismatch - probably due to the bounces and 
reading before hints replayed (hint throttle is quite low in 3.11, you may want 
to increase it).


-- 
Jeff Jirsa


> On Jan 1, 2019, at 11:51 AM, Vlad <qa23d-...@yahoo.com.invalid> wrote:
> 
> Hi, thanks for answer.
> 
> what I don't understand is:
> 
> - why there are attempts of read repair if repair chances are 0.0 ?
> - what can be cause for big mutation size?
> - why hinted handoffs didn't prevent inconsistency? (because of  big mutation 
> size?)
> 
> Thanks.
> 
> 
> On Tuesday, January 1, 2019 9:41 PM, Jeff Jirsa <jji...@gmail.com> wrote:
> 
> 
> Read repair due to digest mismatch and speculative retry can both cause some 
> behaviors that are hard to reason about (usually seen if a host stops 
> accepting writes due to bad disk, which you havent described, but generally 
> speaking, there are times when reads will block on writing to extra 
> replicas). 
> 
> The patch from https://issues.apache.org/jira/browse/CASSANDRA-10726 changes 
> this behavior significantly.
> 
> The last message in this thread (about huge read repair mutations) suggests 
> that your writes during the bounce got some partitions quite out of sync, and 
> hints aren't replaying fast enough to fill in the gaps before you read, and 
> the read repair is timing out. The read repair timing out wouldn't block the 
> read after 10726, so if you're seeing read timeouts right now, what you 
> probably want to do is run repair or read much smaller pages so that read 
> repair succeeds, or increase your commitlog segment size from 32M to 128M or 
> so until the read repair actually succeeds. 
> 
> 
> On Tue, Jan 1, 2019 at 12:18 AM Vlad <qa23d-...@yahoo.com.invalid> wrote:
> Hi All and Happy New Year!!!
> 
> This year started with Cassandra 3.11.3 sometimes forces level ALL despite 
> query level LOCAL_QUORUM (actually there is only one DC) and it fails with 
> timeout.
> 
> As far as I understand, it can be caused by read repair attempts (we see 
> "DigestMismatch" errors in Cassandra log), but table has no read repair 
> configured:
> 
>     AND bloom_filter_fp_chance = 0.01
>     AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
>     AND comment = ''
>     AND compaction = {'class': 
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 
> 'max_threshold': '32', 'min_threshold': '4'}
>     AND compression = {'chunk_length_in_kb': '64', 'class': 
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>     AND crc_check_chance = 1.0
>     AND dclocal_read_repair_chance = 0.0
>     AND default_time_to_live = 0
>     AND gc_grace_seconds = 864000
>     AND max_index_interval = 2048
>     AND memtable_flush_period_in_ms = 0
>     AND min_index_interval = 128
>     AND read_repair_chance = 0.0
>     AND speculative_retry = '99PERCENTILE';
> 
> 
> Any suggestions?
> 
> Thanks.
> 
> 

Reply via email to