Some more info.

From java using the Datastax 4.9.0 driver, I'm selecting an entire table, after about 17 million rows (the table is probably around 150 million rows), I get:

com.datastax.oss.driver.api.core.servererrors.ReadFailureException: Cassandra failure during read query at consistency ONE (1 responses were required but only 0 replica responded, 1 failed)

It's almost as if the data was not written with LOCAL_QUORUM, but I've triple checked.

If I stop writes to the table and reduce the load on Cassandra, then it (java program) works OK.  Presto queries still fail, but that might be a Presto issue.  Interestingly they sometimes fail quickly, coming back with the 'Cassandra failure during read query' error very quickly, but sometimes go through 140 million rows and then die.

Are regular table repairs required to be run when using LOCAL_QUORUM?  I see no nodes down, or disk failures.

-Joe

On 12/14/2020 9:41 AM, Joe Obernberger wrote:

Thanks all for the help on this.  I've changed all my writes to LOCAL_QUORUM, and same with reads.  Under a constant load of doing writes to a table and reads from the same table, I'm still getting the:

DEBUG [ReadRepairStage:372] 2020-12-14 09:36:09,002 ReadCallback.java:244 - Digest mismatch: org.apache.cassandra.service.DigestMismatchException: Mismatch for key DecoratedKey(-7287062361589376757, 44535f313034335f333332353839305f323032302d31322d31325430302d31392d33312e3330335a) (054250ecd7170b1707ec36c6f1798ed0 vs 5752eec36bff050dd363b7803c500a95)         at org.apache.cassandra.service.DigestResolver.compareResponses(DigestResolver.java:92) ~[apache-cassandra-3.11.9.jar:3.11.9]         at org.apache.cassandra.service.ReadCallback$AsyncRepairRunner.run(ReadCallback.java:235) ~[apache-cassandra-3.11.9.jar:3.11.9]         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_272]         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_272]         at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:84) [apache-cassandra-3.11.9.jar:3.11.9]
        at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_272]

Under load this happens a lot; several times a second on each of the server nodes.  I started with a new table and under light load, it worked wonderfully - no issues.  But under heavy load, it still occurs.  Is there a different setting? Also, when this happens, I cannot query the table from presto as I then get the familiar:

"Query 20201214_143949_00000_b3fnt failed: Cassandra timeout during read query at consistency LOCAL_QUORUM (2 responses were required but only 1 replica responded)"

Changed presto to use ONE results in an error about 1 were required, but only 1 responded.

Any ideas?  Things to try?  Thanks!

-Joe

On 12/3/2020 12:49 AM, Erick Ramirez wrote:

    Thank you Steve - once I have the key, how do I get to a node?

Run this command to determine which replicas own the partition:

$ nodetool getendpoints <partition_key>

    So if the propagation has not taken place and a node doesn't have
    the data and is the first to 'be asked' the client will get no data?

That's correct. It will not return data it doesn't have when querying with a consistency of ONE. There are limited cases where ONE is applicable. In most cases, a strong consistency of LOCAL_QUORUM is recommended to avoid the scenario you described. Cheers!

<http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient> Virus-free. www.avg.com <http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient>

<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

Reply via email to