Carl, your speculation matches our observations, and we have a use case with that unfortunate usage pattern. Write-then-immediately-read is not friendly to eventually-consistent data stores. It makes the reading pay a tax that really is associated with writing activity.
From: Carl Mueller <carl.muel...@smartthings.com.INVALID> Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org> Date: Monday, December 9, 2019 at 3:18 PM To: "user@cassandra.apache.org" <user@cassandra.apache.org> Subject: Re: Seeing tons of DigestMismatchException exceptions after upgrading from 2.2.13 to 3.11.4 Message from External Sender My speculation on rapidly churning/fast reads of recently written data: - data written at quorum (for RF3): write confirm is after two nodes reply - data read very soon after (possibly code antipattern), and let's assume the third node update hasn't completed yet (e.g. AWS network "variance"). The read will pick a replica, and then there is a 50% chance the second replica chosen for quorum read is the stale node, which triggers a DigestMismatch read repair. Is that plausible? The code seems to log the exception in all read repair instances, so it doesn't seem to be an ERROR with red blaring klaxons, maybe it should be a WARN? On Mon, Nov 25, 2019 at 11:12 AM Colleen Velo <cmv...@gmail.com<mailto:cmv...@gmail.com>> wrote: Hello, As part of the final stages of our 2.2 --> 3.11 upgrades, one of our clusters (on AWS/ 18 nodes/ m4.2xlarge) produced some post-upgrade fits. We started getting spikes of Cassandra read and write timeouts despite the fact the overall metrics volumes were unchanged. As part of the upgrade process, there was a TWCS table that we used a facade implementation to help change the namespace of the compaction class, but that has very low query volume. The DigestMismatchException error messages, (based on sampling the hash keys and finding which tables have partitions for that hash key), seem to be occurring on the heaviest volume table (4,000 reads, 1600 writes per second per node approximately), and that table has semi-medium row widths with about 10-40 column keys. (Or at least the digest mismatch partitions have that type of width). The keyspace is an RF3 using NetworkTopology, the CL is QUORUM for both reads and writes. We have experienced the DigestMismatchException errors on all 3 of the Production clusters that we have upgraded (all of them are single DC in the us-east-1/eu-west-1/ap-northeast-2 AWS regions) and in all three cases, those DigestMismatchException errors were not there in either the 2.1.x or 2.2.x versions of Cassandra. Does anyone know of changes from 2.2 to 3.11 that would produce additional timeout problems, such as heavier blocking read repair logic? Also, We ran repairs (via reaper v1.4.8) (much nicer in 3.11 than 2.1) on all of the tables and across all of the nodes, and our timeouts seemed to have disappeared, but we continue to see a rapid streaming of the Digest mismatches exceptions, so much so that our Cassandra debug logs are rolling over every 15 minutes.. There is a mail list post from 2018 that indicates that some DigestMismatchException error messages are natural if you are reading while writing, but the sheer volume that we are getting is very concerning: - https://www.mail-archive.com/user@cassandra.apache.org/msg56078.html<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.mail-2Darchive.com_user-40cassandra.apache.org_msg56078.html&d=DwMFaQ&c=9Hv6XPedRSA-5PSECC38X80c1h60_XWA4z1k_R1pROA&r=OIgB3poYhzp3_A7WgD7iBCnsJaYmspOa2okNpf6uqWc&m=dwLj6E_WYM8uXYOVXSvTCxWeihgwwGEpbPrvDTOoQ24&s=2QbuYooXdG_wC9dKbsjNzdNLXkbXAW_517Xu7lqhKws&e=> Is that level of DigestMismatchException unusual? Or is can that volume of mismatches appear if semi-wide rows simply require a lot of resolution because flurries of quorum reads/writes (RF3) on recent partitions have a decent chance of not having fully synced data on the replica reads? Does the digest mismatch error get debug-logged on every chance read repair? (edited) Also, why are these DigestMismatchException only occurring once the upgrade to 3.11 has occurred? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Sample DigestMismatchException error message: DEBUG [ReadRepairStage:13] 2019-11-22 01:38:14,448 ReadCallback.java:242 - Digest mismatch: org.apache.cassandra.service.DigestMismatchException: Mismatch for key DecoratedKey(-6492169518344121155, 66306139353831322d323064382d313037322d663965632d636565663165326563303965) (be2c0feaa60d99c388f9d273fdc360f7 vs 09eaded2d69cf2dd49718076edf56b36) at org.apache.cassandra.service.DigestResolver.compareResponses(DigestResolver.java:92) ~[apache-cassandra-3.11.4.jar:3.11.4] at org.apache.cassandra.service.ReadCallback$AsyncRepairRunner.run(ReadCallback.java:233) ~[apache-cassandra-3.11.4.jar:3.11.4] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_77] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_77] at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81) [apache-cassandra-3.11.4.jar:3.11.4] at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_77] Cluster(s) setup: * AWS region: eu-west-1: — Nodes: 18 — single DC — keyspace: RF3 using NetworkTopology * AWS region: us-east-1: — Nodes: 20 — single DC — keyspace: RF3 using NetworkTopology * AWS region: ap-northeast-2: — Nodes: 30 — single DC — keyspace: RF3 using NetworkTopology Thanks for any insight into this issue. -- Colleen Velo email: cmv...@gmail.com<mailto:cmv...@gmail.com>