Hello Chahat,
First, can you please make sure the Cassandra user used by the
application is not "cassandra"? Because the "cassandra" user uses QUORUM
consistency level to read the auth tables.
Then, can you please make sure the replication strategy is set correctly
for the system_auth namespace? I.e.: ensure the old DC is not present,
and the new DC has sufficient number of replicas for fault tolerance.
Finally, can you please check the GC logs, and make sure there isn't JVM
GC issues, espicially long STW pauses?
Regards,
Bowen
On 27/07/2021 08:34, Chahat Bhatia wrote:
Hi Community,
Context: We are running a cluster of 6 nodes in production with a RF=3
in AWS.
We recently moved from physical servers to cloud by adding a new DC
and then removing the old one. Everything is working fine in all the
other applications except this one.
*As we recently started experiencing read timeouts in one of our
production applications where the client threw
*
Error An unexpected error occurred server side
onip-IP.ec2.internal:com.google.common.util.concurrent.*UncheckedExecutionException*:*com.google.common.util.concurrent.UncheckedExecutionException:**java.lang.RuntimeException:org.apache.cassandra.exceptions.ReadTimeoutException:Operation
timed out-received
only0responses.*com.datastax.driver.core.exceptions.ServerError:An
unexpected error occurred server side
:com.google.common.util.concurrent.UncheckedExecutionException:com.google.common.util.concurrent.UncheckedExecutionException:java.lang.RuntimeException:org.apache.cassandra.exceptions.ReadTimeoutException:Operation
timed out-received only0responses.
at
com.datastax.driver.core.exceptions.ServerError.copy(ServerError.java:63)~[cassandra-driver-core-3.3.0-shaded.jar!/:?]at
com.datastax.driver.core.exceptions.ServerError.copy(ServerError.java:25)~[cassandra-driver-core-3.3.0-shaded.jar!/:?]at
com.datastax.driver.core.DriverThrowables.propagateCause(DriverThrowables.java:37)~[cassandra-driver-core-3.3.0-shaded.jar!/:?]at
com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:245)~[cassandra-driver-core-3.3.0-shaded.jar!/:?]at
com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:68)~[cassandra-driver-core-3.3.0-shaded.jar!/:?
]
............ cntd
com.datastax.driver.core.exceptions.ReadTimeoutException:
Cassandra timeout during read query at consistency
LOCAL_QUORUM (2 responses were required but only 1 replica
responded)
*And around the same time these were the errors on the server side
(from cassandra logs):
*
*ERROR [RolesCacheRefresh:1] 2021-07-26 06:32:43,094
CassandraDaemon.java:207 - Exception in thread
Thread[RolesCacheRefresh:1,5,main]
java.lang.RuntimeException:
org.apache.cassandra.exceptions.ReadTimeoutException:
Operation timed out - received only 0 responses.
* at
org.apache.cassandra.auth.CassandraRoleManager.getRole(CassandraRoleManager.java:512)
~[apache-cassandra-3.0.13.jar:3.0.13]
at
org.apache.cassandra.auth.CassandraRoleManager.getRoles(CassandraRoleManager.java:280)
~[apache-cassandra-3.0.13.jar:3.0.13]
at
org.apache.cassandra.auth.RolesCache$1$1.call(RolesCache.java:135)
~[apache-cassandra-3.0.13.jar:3.0.13]
at
org.apache.cassandra.auth.RolesCache$1$1.call(RolesCache.java:130)
~[apache-cassandra-3.0.13.jar:3.0.13]
at
java.util.concurrent.FutureTask.run(FutureTask.java:266)
~[na:1.8.0_131]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
~[na:1.8.0_131]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
[na:1.8.0_131]
at
org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79)
[apache-cassandra-3.0.13.jar:3.0.13]
at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_131]
Caused by:
org.apache.cassandra.exceptions.ReadTimeoutException:
Operation timed out - received only 0 responses.
at
org.apache.cassandra.service.ReadCallback.awaitResults(ReadCallback.java:132)
~[apache-cassandra-3.0.13.jar:3.0.13]
at
org.apache.cassandra.service.ReadCallback.get(ReadCallback.java:137)
~[apache-cassandra-3.0.13.jar:3.0.13]
at
org.apache.cassandra.service.AbstractReadExecutor.get(AbstractReadExecutor.java:145)
~[apache-cassandra-3.0.13.jar:3.0.13]
at
org.apache.cassandra.service.StorageProxy$SinglePartitionReadLifecycle.awaitResultsAndRetryOnDigestMismatch(StorageProxy.java:1715)
~[apache-cassandra-3.0.13.jar:3.0.13]
at
org.apache.cassandra.service.StorageProxy.fetchRows(StorageProxy.java:1664)
~[apache-cassandra-3.0.13.jar:3.0.13]
at
org.apache.cassandra.service.StorageProxy.readRegular(StorageProxy.java:1605)
~[apache-cassandra-3.0.13.jar:3.0.13]
at
org.apache.cassandra.service.StorageProxy.read(StorageProxy.java:1524)
~[apache-cassandra-3.0.13.jar:3.0.13]
at
org.apache.cassandra.db.SinglePartitionReadCommand$Group.execute(SinglePartitionReadCommand.java:955)
~[apache-cassandra-3.0.13.jar:3.0.13]
at
org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:263)
~[apache-cassandra-3.0.13.jar:3.0.13]
at
org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:224)
~[apache-cassandra-3.0.13.jar:3.0.13]
at
org.apache.cassandra.auth.CassandraRoleManager.getRoleFromTable(CassandraRoleManager.java:520)
~[apache-cassandra-3.0.13.jar:3.0.13]
at
org.apache.cassandra.auth.CassandraRoleManager.getRole(CassandraRoleManager.java:502)
~[apache-cassandra-3.0.13.jar:3.0.13]
*ERROR [PermissionsCacheRefresh:1] 2021-07-26 07:11:25,804
CassandraDaemon.java:207 - Exception in thread
Thread[PermissionsCacheRefresh:1,5,main]
java.lang.RuntimeException:
org.apache.cassandra.exceptions.ReadTimeoutException:
Operation timed out - received only 0 responses.
* at
org.apache.cassandra.auth.CassandraRoleManager.getRole(CassandraRoleManager.java:512)
~[apache-cassandra-3.0.13.jar:3.0.13]
at
org.apache.cassandra.auth.CassandraRoleManager.isSuper(CassandraRoleManager.java:304)
~[apache-cassandra-3.0.13.jar:3.0.13]
at
org.apache.cassandra.auth.Roles.hasSuperuserStatus(Roles.java:52)
~[apache-cassandra-3.0.13.jar:3.0.13]
at
org.apache.cassandra.auth.AuthenticatedUser.isSuper(AuthenticatedUser.java:71)
~[apache-cassandra-3.0.13.jar:3.0.13]
at
org.apache.cassandra.auth.CassandraAuthorizer.authorize(CassandraAuthorizer.java:76)
~[apache-cassandra-3.0.13.jar:3.0.13]
at
org.apache.cassandra.auth.PermissionsCache$1$1.call(PermissionsCache.java:136)
~[apache-cassandra-3.0.13.jar:3.0.13]
at
org.apache.cassandra.auth.PermissionsCache$1$1.call(PermissionsCache.java:131)
~[apache-cassandra-3.0.13.jar:3.0.13]
at
java.util.concurrent.FutureTask.run(FutureTask.java:266)
~[na:1.8.0_131]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
~[na:1.8.0_131]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
[na:1.8.0_131]
at
org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79)
[apache-cassandra-3.0.13.jar:3.0.13]
at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_131]
Caused by:
org.apache.cassandra.exceptions.ReadTimeoutException:
Operation timed out - received only 0 responses.
at
org.apache.cassandra.service.ReadCallback.awaitResults(ReadCallback.java:132)
~[apache-cassandra-3.0.13.jar:3.0.13]
at
org.apache.cassandra.service.ReadCallback.get(ReadCallback.java:137)
~[apache-cassandra-3.0.13.jar:3.0.13]
at
org.apache.cassandra.service.AbstractReadExecutor.get(AbstractReadExecutor.java:145)
~[apache-cassandra-3.0.13.jar:3.0.13]
at
org.apache.cassandra.service.StorageProxy$SinglePartitionReadLifecycle.awaitResultsAndRetryOnDigestMismatch(StorageProxy.java:1715)
~[apache-cassandra-3.0.13.jar:3.0.13]
at
org.apache.cassandra.service.StorageProxy.fetchRows(StorageProxy.java:1664)
~[apache-cassandra-3.0.13.jar:3.0.13]
at
org.apache.cassandra.service.StorageProxy.readRegular(StorageProxy.java:1605)
~[apache-cassandra-3.0.13.jar:3.0.13]
at
org.apache.cassandra.service.StorageProxy.read(StorageProxy.java:1524)
~[apache-cassandra-3.0.13.jar:3.0.13]
at
org.apache.cassandra.db.SinglePartitionReadCommand$Group.execute(SinglePartitionReadCommand.java:955)
~[apache-cassandra-3.0.13.jar:3.0.13]
at
org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:263)
~[apache-cassandra-3.0.13.jar:3.0.13]
at
org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:224)
~[apache-cassandra-3.0.13.jar:3.0.13]
at
org.apache.cassandra.auth.CassandraRoleManager.getRoleFromTable(CassandraRoleManager.java:520)
~[apache-cassandra-3.0.13.jar:3.0.13]
at
org.apache.cassandra.auth.CassandraRoleManager.getRole(CassandraRoleManager.java:502)
~[apache-cassandra-3.0.13.jar:3.0.13]
*These are the values of these params in my configuration file
*
permissions_validity_in_ms: 300000
permissions_update_interval_in_ms: 20000
roles_validity_in_ms: 300000
roles_update_interval_in_ms: 20000
This was not the case earlier and since this comes from a single app
alone we are not sure if this is actually the issue. Can anyone please
point out if these values are misconfigured and hence causing the
issue or is it somewhere else we should be looking at?
Any help would be appreciated.
Thanks & Regards,
Chahat.