its not really time series data. and it's not updated very often, it would have some updates but pretty infrequent. this thing should be super fast, on avg it's like 1 to 2ms p99 currently but if they double - triple the traffic on that table latencies go upward to 20ms to 50ms.. the only odd thing i see is just that there are constant read repairs that follow the same traffic pattern on the reads, which shows constant writes on the table (from the read repairs), which after read repair or just normal full repairs (all full through reaper, never ran any incremental repair) i would expect it to not have any mismatches. the other 5 tables they use on the cluster can have the same level traffic all very simple select from table by partition key which returns a single record
On Thu, Oct 3, 2019 at 4:21 PM John Belliveau <belliveau.j...@gmail.com> wrote: > Hi Patrick, > > > > Is this time series data? If so, I have run into issues with repair on > time series data using the SizeTieredCompactionStrategy. I have had > better luck using the TimeWindowCompactionStrategy. > > > > John > > > > Sent from Mail <https://go.microsoft.com/fwlink/?LinkId=550986> for > Windows 10 > > > > *From: *Patrick Lee <patrickclee0...@gmail.com> > *Sent: *Thursday, October 3, 2019 5:14 PM > *To: *user@cassandra.apache.org > *Subject: *Constant blocking read repair for such a tiny table > > > > I have a cluster that is running 3.11.4 ( was upgraded a while back from > 2.1.16 ). what I see is a steady rate of read repair which is about 10% > constantly on only this 1 table. Repairs have been run (actually several > times). The table does not have a lot of writes to it so after repair, or > even after a read repair I would expect it to be fine. the reason i'm > having to dig into this so much is for the fact that under a much large > traffic load than their normal traffic, latencies are higher than the app > team wants > > > > I mean this thing is tiny, it's a 12x12 cluster but this 1 table is like > 1GB per node on disk. > > > > the application team is doing reads at LOCAL_QUORUM and I can simulate > this on that cluster by running a query using quorum and/or local_quorum > and in the trace can see every time running the query it comes back with a > DigestMismatchException no matter how many times I run it. that record > hasn't been updated by the application for several months. > > > > repairs are scheduled and run every 7 days via reaper, recently in the > past week this table has been repaired at least 3 times. every time there > are mismatches and data streams back and forth but yet still a constant > rate of read repairs. > > > > curious if anyone has any recommendations to look info further or have > experienced anything like this? > > > > this node has been up for 24 hours.. this is the netstats for read repairs > > Mode: NORMAL > Not sending any streams. > Read Repair Statistics: > Attempted: 7481 > Mismatch (Blocking): 11425375 > Mismatch (Background): 17 > Pool Name Active Pending Completed Dropped > Large messages n/a 0 1232 0 > Small messages n/a 0 395903678 0 > Gossip messages n/a 0 603746 0 > > > > example of the schema... some modifications have been made to reduce > read_reapair and speculative_retry while troubleshooting.. > > CREATE TABLE keyspace.table1 ( > > item bigint, > > price int, > > start_date timestamp, > > end_date timestamp, > > created_date timestamp, > > cost decimal, > > list decimal, > > item_id int, > > modified_date timestamp, > > status int, > > PRIMARY KEY ((item, price), start_date, end_date) > > ) WITH CLUSTERING ORDER BY (start_date ASC, end_date ASC) > > AND read_repair_chance = 0.0 > > AND dclocal_read_repair_chance = 0.0 > > AND gc_grace_seconds = 864000 > > AND bloom_filter_fp_chance = 0.01 > > AND caching = { 'keys' : 'ALL', 'rows_per_partition' : 'NONE' } > > AND comment = '' > > AND compaction = { 'class' : > 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', > 'max_threshold' : 32, 'min_threshold' : 4 } > > AND compression = { 'chunk_length_in_kb' : 4, 'class' : > 'org.apache.cassandra.io.compress.LZ4Compressor' } > > AND default_time_to_live = 0 > > AND speculative_retry = 'NONE' > > AND min_index_interval = 128 > > AND max_index_interval = 2048 > > AND crc_check_chance = 1.0 > > AND cdc = false > > AND memtable_flush_period_in_ms = 0; > > >