Hello Chahat,

First, can you please make sure the Cassandra user used by the application is not "cassandra"? Because the "cassandra" user uses QUORUM consistency level to read the auth tables.

Then, can you please make sure the replication strategy is set correctly for the system_auth namespace? I.e.: ensure the old DC is not present, and the new DC has sufficient number of replicas for fault tolerance.

Finally, can you please check the GC logs, and make sure there isn't JVM GC issues, espicially long STW pauses?


Regards,

Bowen


On 27/07/2021 08:34, Chahat Bhatia wrote:
Hi Community,

Context: We are running a cluster of 6 nodes in production with a RF=3 in AWS. We recently moved from physical servers to cloud by adding a new DC and then removing the old one. Everything is working fine in all the other applications except this one.

*As we recently started experiencing read timeouts in one of our production applications where the client threw
*

        Error An unexpected error occurred server side
        
onip-IP.ec2.internal:com.google.common.util.concurrent.*UncheckedExecutionException*:*com.google.common.util.concurrent.UncheckedExecutionException:**java.lang.RuntimeException:org.apache.cassandra.exceptions.ReadTimeoutException:Operation
        timed out-received
        only0responses.*com.datastax.driver.core.exceptions.ServerError:An
        unexpected error occurred server side
        
:com.google.common.util.concurrent.UncheckedExecutionException:com.google.common.util.concurrent.UncheckedExecutionException:java.lang.RuntimeException:org.apache.cassandra.exceptions.ReadTimeoutException:Operation
        timed out-received only0responses.

        at
        
com.datastax.driver.core.exceptions.ServerError.copy(ServerError.java:63)~[cassandra-driver-core-3.3.0-shaded.jar!/:?]at
        
com.datastax.driver.core.exceptions.ServerError.copy(ServerError.java:25)~[cassandra-driver-core-3.3.0-shaded.jar!/:?]at
        
com.datastax.driver.core.DriverThrowables.propagateCause(DriverThrowables.java:37)~[cassandra-driver-core-3.3.0-shaded.jar!/:?]at
        
com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:245)~[cassandra-driver-core-3.3.0-shaded.jar!/:?]at
        
com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:68)~[cassandra-driver-core-3.3.0-shaded.jar!/:?
]
         ............ cntd

        com.datastax.driver.core.exceptions.ReadTimeoutException:
        Cassandra timeout during read query at consistency
        LOCAL_QUORUM (2 responses were required but only 1 replica
        responded)



*And around the same time these were the errors on the server side (from cassandra logs):

*

        *ERROR [RolesCacheRefresh:1] 2021-07-26 06:32:43,094
        CassandraDaemon.java:207 - Exception in thread
        Thread[RolesCacheRefresh:1,5,main]
        java.lang.RuntimeException:
        org.apache.cassandra.exceptions.ReadTimeoutException:
        Operation timed out - received only 0 responses.
        *        at
        
org.apache.cassandra.auth.CassandraRoleManager.getRole(CassandraRoleManager.java:512)
        ~[apache-cassandra-3.0.13.jar:3.0.13]
                at
        
org.apache.cassandra.auth.CassandraRoleManager.getRoles(CassandraRoleManager.java:280)
        ~[apache-cassandra-3.0.13.jar:3.0.13]
                at
        org.apache.cassandra.auth.RolesCache$1$1.call(RolesCache.java:135)
        ~[apache-cassandra-3.0.13.jar:3.0.13]
                at
        org.apache.cassandra.auth.RolesCache$1$1.call(RolesCache.java:130)
        ~[apache-cassandra-3.0.13.jar:3.0.13]
                at
        java.util.concurrent.FutureTask.run(FutureTask.java:266)
        ~[na:1.8.0_131]
                at
        
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        ~[na:1.8.0_131]
                at
        
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        [na:1.8.0_131]
                at
        
org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79)
        [apache-cassandra-3.0.13.jar:3.0.13]
                at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_131]
        Caused by:
        org.apache.cassandra.exceptions.ReadTimeoutException:
        Operation timed out - received only 0 responses.
                at
        
org.apache.cassandra.service.ReadCallback.awaitResults(ReadCallback.java:132)
        ~[apache-cassandra-3.0.13.jar:3.0.13]
                at
        org.apache.cassandra.service.ReadCallback.get(ReadCallback.java:137)
        ~[apache-cassandra-3.0.13.jar:3.0.13]
                at
        
org.apache.cassandra.service.AbstractReadExecutor.get(AbstractReadExecutor.java:145)
        ~[apache-cassandra-3.0.13.jar:3.0.13]
                at
        
org.apache.cassandra.service.StorageProxy$SinglePartitionReadLifecycle.awaitResultsAndRetryOnDigestMismatch(StorageProxy.java:1715)
        ~[apache-cassandra-3.0.13.jar:3.0.13]
                at
        
org.apache.cassandra.service.StorageProxy.fetchRows(StorageProxy.java:1664)
        ~[apache-cassandra-3.0.13.jar:3.0.13]
                at
        
org.apache.cassandra.service.StorageProxy.readRegular(StorageProxy.java:1605)
        ~[apache-cassandra-3.0.13.jar:3.0.13]
                at
        org.apache.cassandra.service.StorageProxy.read(StorageProxy.java:1524)
        ~[apache-cassandra-3.0.13.jar:3.0.13]
                at
        
org.apache.cassandra.db.SinglePartitionReadCommand$Group.execute(SinglePartitionReadCommand.java:955)
        ~[apache-cassandra-3.0.13.jar:3.0.13]
                at
        
org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:263)
        ~[apache-cassandra-3.0.13.jar:3.0.13]
                at
        
org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:224)
        ~[apache-cassandra-3.0.13.jar:3.0.13]
                at
        
org.apache.cassandra.auth.CassandraRoleManager.getRoleFromTable(CassandraRoleManager.java:520)
        ~[apache-cassandra-3.0.13.jar:3.0.13]
                at
        
org.apache.cassandra.auth.CassandraRoleManager.getRole(CassandraRoleManager.java:502)
        ~[apache-cassandra-3.0.13.jar:3.0.13]

        *ERROR [PermissionsCacheRefresh:1] 2021-07-26 07:11:25,804
        CassandraDaemon.java:207 - Exception in thread
        Thread[PermissionsCacheRefresh:1,5,main]
        java.lang.RuntimeException:
        org.apache.cassandra.exceptions.ReadTimeoutException:
        Operation timed out - received only 0 responses.
        *        at
        
org.apache.cassandra.auth.CassandraRoleManager.getRole(CassandraRoleManager.java:512)
        ~[apache-cassandra-3.0.13.jar:3.0.13]
                at
        
org.apache.cassandra.auth.CassandraRoleManager.isSuper(CassandraRoleManager.java:304)
        ~[apache-cassandra-3.0.13.jar:3.0.13]
                at
        org.apache.cassandra.auth.Roles.hasSuperuserStatus(Roles.java:52)
        ~[apache-cassandra-3.0.13.jar:3.0.13]
                at
        
org.apache.cassandra.auth.AuthenticatedUser.isSuper(AuthenticatedUser.java:71)
        ~[apache-cassandra-3.0.13.jar:3.0.13]
                at
        
org.apache.cassandra.auth.CassandraAuthorizer.authorize(CassandraAuthorizer.java:76)
        ~[apache-cassandra-3.0.13.jar:3.0.13]
                at
        
org.apache.cassandra.auth.PermissionsCache$1$1.call(PermissionsCache.java:136)
        ~[apache-cassandra-3.0.13.jar:3.0.13]
                at
        
org.apache.cassandra.auth.PermissionsCache$1$1.call(PermissionsCache.java:131)
        ~[apache-cassandra-3.0.13.jar:3.0.13]
                at
        java.util.concurrent.FutureTask.run(FutureTask.java:266)
        ~[na:1.8.0_131]
                at
        
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        ~[na:1.8.0_131]
                at
        
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        [na:1.8.0_131]
                at
        
org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79)
        [apache-cassandra-3.0.13.jar:3.0.13]
                at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_131]
        Caused by:
        org.apache.cassandra.exceptions.ReadTimeoutException:
        Operation timed out - received only 0 responses.
                at
        
org.apache.cassandra.service.ReadCallback.awaitResults(ReadCallback.java:132)
        ~[apache-cassandra-3.0.13.jar:3.0.13]
                at
        org.apache.cassandra.service.ReadCallback.get(ReadCallback.java:137)
        ~[apache-cassandra-3.0.13.jar:3.0.13]
                at
        
org.apache.cassandra.service.AbstractReadExecutor.get(AbstractReadExecutor.java:145)
        ~[apache-cassandra-3.0.13.jar:3.0.13]
                at
        
org.apache.cassandra.service.StorageProxy$SinglePartitionReadLifecycle.awaitResultsAndRetryOnDigestMismatch(StorageProxy.java:1715)
        ~[apache-cassandra-3.0.13.jar:3.0.13]
                at
        
org.apache.cassandra.service.StorageProxy.fetchRows(StorageProxy.java:1664)
        ~[apache-cassandra-3.0.13.jar:3.0.13]
                at
        
org.apache.cassandra.service.StorageProxy.readRegular(StorageProxy.java:1605)
        ~[apache-cassandra-3.0.13.jar:3.0.13]
                at
        org.apache.cassandra.service.StorageProxy.read(StorageProxy.java:1524)
        ~[apache-cassandra-3.0.13.jar:3.0.13]
                at
        
org.apache.cassandra.db.SinglePartitionReadCommand$Group.execute(SinglePartitionReadCommand.java:955)
        ~[apache-cassandra-3.0.13.jar:3.0.13]
                at
        
org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:263)
        ~[apache-cassandra-3.0.13.jar:3.0.13]
                at
        
org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:224)
        ~[apache-cassandra-3.0.13.jar:3.0.13]
                at
        
org.apache.cassandra.auth.CassandraRoleManager.getRoleFromTable(CassandraRoleManager.java:520)
        ~[apache-cassandra-3.0.13.jar:3.0.13]
                at
        
org.apache.cassandra.auth.CassandraRoleManager.getRole(CassandraRoleManager.java:502)
        ~[apache-cassandra-3.0.13.jar:3.0.13]



*These are the values of these params in my configuration file
*

        permissions_validity_in_ms: 300000
        permissions_update_interval_in_ms: 20000
        roles_validity_in_ms: 300000
        roles_update_interval_in_ms: 20000


This was not the case earlier and since this comes from a single app alone we are not sure if this is actually the issue. Can anyone please point out if these values are misconfigured and hence causing the issue or is it somewhere else we should be looking at?

Any help would be appreciated.

Thanks & Regards,
Chahat.

Reply via email to