>
> Also, It's interesting that you've set validity to over 3 days but you
> update them every 6 hours. Is that intentional?

 We set that earlier when were in the process to add new roles (creating
new roles for the new apps we setup) but we never changed after that and
hence its been the same since.

On Tue, 27 Jul 2021 at 13:53, Chahat Bhatia <chahat.bha...@goevive.com>
wrote:

> Thanks for the prompt response.
>
> *Here is the system_schema.keyspaces entry:*
>
>                 system_auth |           True | {'class':
>> 'org.apache.cassandra.locator.NetworkTopologyStrategy', 'us-east': '6',
>> 'us-east-backup': '1'}
>>                      census |           True | {'class':
>> 'org.apache.cassandra.locator.NetworkTopologyStrategy', 'us-east': '3',
>> 'us-east-backup': '1'}
>
>
>
> So, the system_auth  for 2 DCs :   *us-east with 6 nodes (and RF=3) and
> us-east-backup with a single node and 100% of the data.*
> *Keyspace census* showing RF=3 for the main DC and RF=1 for the backup
> site.
>
>
> And for running the application, we have a user specifically created for
>> that application itself and that user also has its appropriate permissions
>> in cassandra to select, modify and delete from concerned tables. And it
>> uses LOCAL_QUORUM for querying the data and the local-dc is set to
>> 'us-east'.
>
>
>
> Also, there is no excessive GC for any of the nodes, we run a custom
> script to trackthe GC stats (from the cassandra log itself) and output it.
> Below is the output from the current running script and its similar for all
> the servers:
>
> 2021-07-27 03:04:45,072 INFO gcstats:58 - Application Thread stop time
>> 0.001656 seconds.
>> 2021-07-27 03:04:45,080 INFO gcstats:58 - Application Thread stop time
>> 0.001669 seconds.
>> 2021-07-27 03:04:45,087 INFO gcstats:58 - Application Thread stop time
>> 0.001601 seconds.
>> 2021-07-27 03:04:45,095 INFO gcstats:58 - Application Thread stop time
>> 0.001713 seconds.
>> 2021-07-27 03:04:45,103 INFO gcstats:58 - Application Thread stop time
>> 0.001586 seconds.
>> 2021-07-27 03:04:45,110 INFO gcstats:58 - Application Thread stop time
>> 0.001671 seconds.
>> 2021-07-27 03:04:45,118 INFO gcstats:58 - Application Thread stop time
>> 0.001691 seconds.
>> 2021-07-27 03:04:45,127 INFO gcstats:58 - Application Thread stop time
>> 0.001860 seconds.
>> 2021-07-27 03:04:45,134 INFO gcstats:58 - Application Thread stop time
>> 0.001630 seconds.
>> 2021-07-27 03:04:45,141 INFO gcstats:58 - Application Thread stop time
>> 0.001515 seconds.
>> 2021-07-27 03:04:45,148 INFO gcstats:58 - Application Thread stop time
>> 0.001533 seconds.
>> 2021-07-27 03:04:45,156 INFO gcstats:58 - Application Thread stop time
>> 0.001630 seconds.
>> 2021-07-27 03:04:45,163 INFO gcstats:58 - Application Thread stop time
>> 0.001577 seconds.
>> 2021-07-27 03:04:45,170 INFO gcstats:58 - Application Thread stop time
>> 0.001538 seconds.
>> 2021-07-27 03:04:45,177 INFO gcstats:58 - Application Thread stop time
>> 0.001615 seconds.
>> 2021-07-27 03:04:45,186 INFO gcstats:58 - Application Thread stop time
>> 0.001584 seconds.
>> 2021-07-27 03:04:45,193 INFO gcstats:58 - Application Thread stop time
>> 0.001558 seconds.
>> 2021-07-27 03:04:45,200 INFO gcstats:58 - Application Thread stop time
>> 0.001696 seconds.
>> 2021-07-27 03:04:45,208 INFO gcstats:58 - Application Thread stop time
>> 0.001658 seconds.
>> 2021-07-27 03:04:45,215 INFO gcstats:58 - Application Thread stop time
>> 0.001592 seconds.
>> 2021-07-27 03:04:45,222 INFO gcstats:58 - Application Thread stop time
>> 0.001618 seconds.
>> 2021-07-27 03:05:08,907 INFO gcstats:58 - Application Thread stop time
>> 0.001624 seconds.
>> 2021-07-27 03:06:34,436 INFO gcstats:58 - Application Thread stop time
>> 0.297773 seconds.
>>
>
>
>
>
>
> On Tue, 27 Jul 2021 at 13:23, 'Bowen Song' via Infra Updates <
> infra-upda...@goevive.com> wrote:
>
>> Hello Chahat,
>>
>>
>> First, can you please make sure the Cassandra user used by the
>> application is not "cassandra"? Because the "cassandra" user uses QUORUM
>> consistency level to read the auth tables.
>>
>> Then, can you please make sure the replication strategy is set correctly
>> for the system_auth namespace? I.e.: ensure the old DC is not present, and
>> the new DC has sufficient number of replicas for fault tolerance.
>>
>> Finally, can you please check the GC logs, and make sure there isn't JVM
>> GC issues, espicially long STW pauses?
>>
>>
>> Regards,
>>
>> Bowen
>>
>>
>> On 27/07/2021 08:34, Chahat Bhatia wrote:
>>
>> Hi Community,
>>
>> Context: We are running a cluster of 6 nodes in production with a RF=3 in
>> AWS.
>> We recently moved from physical servers to cloud by adding a new DC and
>> then removing the old one. Everything is working fine in all the other
>> applications except this one.
>>
>>
>> *As we recently started experiencing read timeouts in one of our
>> production applications where the client threw  *
>>
>> Error An unexpected error occurred server side on ip-IP.ec2.internal:
>>> com.google.common.util.concurrent.*UncheckedExecutionException*:
>>> *com.google.common.util.concurrent.UncheckedExecutionException:**
>>> java.lang.RuntimeException:
>>> org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out -
>>> received only 0 responses.*
>>> com.datastax.driver.core.exceptions.ServerError: An unexpected error
>>> occurred server side  :
>>> com.google.common.util.concurrent.UncheckedExecutionException:
>>> com.google.common.util.concurrent.UncheckedExecutionException:
>>> java.lang.RuntimeException:
>>> org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed
>>> out - received only 0 responses.
>>
>> at com.datastax.driver.core.exceptions.ServerError.copy(ServerError.java:
>>> 63) ~[cassandra-driver-core-3.3.0-shaded.jar!/:?] at
>>> com.datastax.driver.core.exceptions.ServerError.copy(ServerError.java:25
>>> ) ~[cassandra-driver-core-3.3.0-shaded.jar!/:?] at
>>> com.datastax.driver.core.DriverThrowables.propagateCause(
>>> DriverThrowables.java:37) ~[cassandra-driver-core-3.3.0-shaded.jar!/:?]
>>> at com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(
>>> DefaultResultSetFuture.java:245) ~[cassandra-driver-core-3
>>> .3.0-shaded.jar!/:?] at com.datastax.driver.core.AbstractSession.execute
>>> (AbstractSession.java:68) ~[cassandra-driver-core-3.3.0-shaded.jar!/:? ]
>>>
>>
>>  ............ cntd
>>
>>
>>
>> com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra
>>> timeout during read query at consistency LOCAL_QUORUM (2 responses were
>>> required but only 1 replica responded)
>>
>>
>>
>>
>>
>> *And around the same time these were the errors on the server side (from
>> cassandra logs): *
>>
>>
>>>
>>> *ERROR [RolesCacheRefresh:1] 2021-07-26 06:32:43,094
>>> CassandraDaemon.java:207 - Exception in thread
>>> Thread[RolesCacheRefresh:1,5,main] java.lang.RuntimeException:
>>> org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out -
>>> received only 0 responses. *        at
>>> org.apache.cassandra.auth.CassandraRoleManager.getRole(CassandraRoleManager.java:512)
>>> ~[apache-cassandra-3.0.13.jar:3.0.13]
>>>         at
>>> org.apache.cassandra.auth.CassandraRoleManager.getRoles(CassandraRoleManager.java:280)
>>> ~[apache-cassandra-3.0.13.jar:3.0.13]
>>>         at
>>> org.apache.cassandra.auth.RolesCache$1$1.call(RolesCache.java:135)
>>> ~[apache-cassandra-3.0.13.jar:3.0.13]
>>>         at
>>> org.apache.cassandra.auth.RolesCache$1$1.call(RolesCache.java:130)
>>> ~[apache-cassandra-3.0.13.jar:3.0.13]
>>>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>> ~[na:1.8.0_131]
>>>         at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>> ~[na:1.8.0_131]
>>>         at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>> [na:1.8.0_131]
>>>         at
>>> org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79)
>>> [apache-cassandra-3.0.13.jar:3.0.13]
>>>         at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_131]
>>> Caused by: org.apache.cassandra.exceptions.ReadTimeoutException:
>>> Operation timed out - received only 0 responses.
>>>         at
>>> org.apache.cassandra.service.ReadCallback.awaitResults(ReadCallback.java:132)
>>> ~[apache-cassandra-3.0.13.jar:3.0.13]
>>>         at
>>> org.apache.cassandra.service.ReadCallback.get(ReadCallback.java:137)
>>> ~[apache-cassandra-3.0.13.jar:3.0.13]
>>>         at
>>> org.apache.cassandra.service.AbstractReadExecutor.get(AbstractReadExecutor.java:145)
>>> ~[apache-cassandra-3.0.13.jar:3.0.13]
>>>         at
>>> org.apache.cassandra.service.StorageProxy$SinglePartitionReadLifecycle.awaitResultsAndRetryOnDigestMismatch(StorageProxy.java:1715)
>>> ~[apache-cassandra-3.0.13.jar:3.0.13]
>>>         at
>>> org.apache.cassandra.service.StorageProxy.fetchRows(StorageProxy.java:1664)
>>> ~[apache-cassandra-3.0.13.jar:3.0.13]
>>>         at
>>> org.apache.cassandra.service.StorageProxy.readRegular(StorageProxy.java:1605)
>>> ~[apache-cassandra-3.0.13.jar:3.0.13]
>>>         at
>>> org.apache.cassandra.service.StorageProxy.read(StorageProxy.java:1524)
>>> ~[apache-cassandra-3.0.13.jar:3.0.13]
>>>         at
>>> org.apache.cassandra.db.SinglePartitionReadCommand$Group.execute(SinglePartitionReadCommand.java:955)
>>> ~[apache-cassandra-3.0.13.jar:3.0.13]
>>>         at
>>> org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:263)
>>> ~[apache-cassandra-3.0.13.jar:3.0.13]
>>>         at
>>> org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:224)
>>> ~[apache-cassandra-3.0.13.jar:3.0.13]
>>>         at
>>> org.apache.cassandra.auth.CassandraRoleManager.getRoleFromTable(CassandraRoleManager.java:520)
>>> ~[apache-cassandra-3.0.13.jar:3.0.13]
>>>         at
>>> org.apache.cassandra.auth.CassandraRoleManager.getRole(CassandraRoleManager.java:502)
>>> ~[apache-cassandra-3.0.13.jar:3.0.13]
>>>
>>>
>>>
>>> *ERROR [PermissionsCacheRefresh:1] 2021-07-26 07:11:25,804
>>> CassandraDaemon.java:207 - Exception in thread
>>> Thread[PermissionsCacheRefresh:1,5,main] java.lang.RuntimeException:
>>> org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out -
>>> received only 0 responses. *        at
>>> org.apache.cassandra.auth.CassandraRoleManager.getRole(CassandraRoleManager.java:512)
>>> ~[apache-cassandra-3.0.13.jar:3.0.13]
>>>         at
>>> org.apache.cassandra.auth.CassandraRoleManager.isSuper(CassandraRoleManager.java:304)
>>> ~[apache-cassandra-3.0.13.jar:3.0.13]
>>>         at
>>> org.apache.cassandra.auth.Roles.hasSuperuserStatus(Roles.java:52)
>>> ~[apache-cassandra-3.0.13.jar:3.0.13]
>>>         at
>>> org.apache.cassandra.auth.AuthenticatedUser.isSuper(AuthenticatedUser.java:71)
>>> ~[apache-cassandra-3.0.13.jar:3.0.13]
>>>         at
>>> org.apache.cassandra.auth.CassandraAuthorizer.authorize(CassandraAuthorizer.java:76)
>>> ~[apache-cassandra-3.0.13.jar:3.0.13]
>>>         at
>>> org.apache.cassandra.auth.PermissionsCache$1$1.call(PermissionsCache.java:136)
>>> ~[apache-cassandra-3.0.13.jar:3.0.13]
>>>         at
>>> org.apache.cassandra.auth.PermissionsCache$1$1.call(PermissionsCache.java:131)
>>> ~[apache-cassandra-3.0.13.jar:3.0.13]
>>>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>> ~[na:1.8.0_131]
>>>         at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>> ~[na:1.8.0_131]
>>>         at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>> [na:1.8.0_131]
>>>         at
>>> org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79)
>>> [apache-cassandra-3.0.13.jar:3.0.13]
>>>         at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_131]
>>> Caused by: org.apache.cassandra.exceptions.ReadTimeoutException:
>>> Operation timed out - received only 0 responses.
>>>         at
>>> org.apache.cassandra.service.ReadCallback.awaitResults(ReadCallback.java:132)
>>> ~[apache-cassandra-3.0.13.jar:3.0.13]
>>>         at
>>> org.apache.cassandra.service.ReadCallback.get(ReadCallback.java:137)
>>> ~[apache-cassandra-3.0.13.jar:3.0.13]
>>>         at
>>> org.apache.cassandra.service.AbstractReadExecutor.get(AbstractReadExecutor.java:145)
>>> ~[apache-cassandra-3.0.13.jar:3.0.13]
>>>         at
>>> org.apache.cassandra.service.StorageProxy$SinglePartitionReadLifecycle.awaitResultsAndRetryOnDigestMismatch(StorageProxy.java:1715)
>>> ~[apache-cassandra-3.0.13.jar:3.0.13]
>>>         at
>>> org.apache.cassandra.service.StorageProxy.fetchRows(StorageProxy.java:1664)
>>> ~[apache-cassandra-3.0.13.jar:3.0.13]
>>>         at
>>> org.apache.cassandra.service.StorageProxy.readRegular(StorageProxy.java:1605)
>>> ~[apache-cassandra-3.0.13.jar:3.0.13]
>>>         at
>>> org.apache.cassandra.service.StorageProxy.read(StorageProxy.java:1524)
>>> ~[apache-cassandra-3.0.13.jar:3.0.13]
>>>         at
>>> org.apache.cassandra.db.SinglePartitionReadCommand$Group.execute(SinglePartitionReadCommand.java:955)
>>> ~[apache-cassandra-3.0.13.jar:3.0.13]
>>>         at
>>> org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:263)
>>> ~[apache-cassandra-3.0.13.jar:3.0.13]
>>>         at
>>> org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:224)
>>> ~[apache-cassandra-3.0.13.jar:3.0.13]
>>>         at
>>> org.apache.cassandra.auth.CassandraRoleManager.getRoleFromTable(CassandraRoleManager.java:520)
>>> ~[apache-cassandra-3.0.13.jar:3.0.13]
>>>         at
>>> org.apache.cassandra.auth.CassandraRoleManager.getRole(CassandraRoleManager.java:502)
>>> ~[apache-cassandra-3.0.13.jar:3.0.13]
>>>
>>
>>
>>
>> *These are the values of these params in my configuration file  *
>>
>> permissions_validity_in_ms: 300000
>>> permissions_update_interval_in_ms: 20000
>>> roles_validity_in_ms: 300000
>>> roles_update_interval_in_ms: 20000
>>>
>>
>> This was not the case earlier and since this comes from a single app
>> alone we are not sure if this is actually the issue. Can anyone please
>> point out if these values are misconfigured and hence causing the issue or
>> is it somewhere else we should be looking at?
>>
>> Any help would be appreciated.
>>
>> Thanks & Regards,
>> Chahat.
>>
>>
>
> --
> Thanks & Regards,
> Chahat Bhatia
> Systems Engineer
> *Evive*
> +91 7087629779
>


-- 
Thanks & Regards,
Chahat Bhatia
Systems Engineer
*Evive*
+91 7087629779

Reply via email to