Look for errors on your network interface. I think you have periodic errors in your network connectivity
<======> "Who do you think made the first stone spear? The Asperger guy. If you get rid of the autism genetics, there would be no Silicon Valley" Temple Grandin *Daemeon C.M. ReiydelleSan Francisco 1.415.501.0198London 44 020 8144 9872* On Mon, Mar 26, 2018 at 8:26 PM, Xiangfei Ni <xiangfei...@cm-dt.com> wrote: > Hi Jeff, > > I need to restart the node manually every time,only one node has this > problem. > > I have attached the nodetool output,thanks. > > > > Best Regards, > > > > 倪项菲*/ **David Ni* > > 中移德电网络科技有限公司 > > Virtue Intelligent Network Ltd, co. > > Add: 2003,20F No.35 Luojia creative city,Luoyu Road,Wuhan,HuBei > > Mob: +86 13797007811 <+86%20137%209700%207811>|Tel: + 86 27 5024 2516 > <+86%2027%205024%202516> > > > > *发件人:* Jeff Jirsa <jji...@gmail.com> > *发送时间:* 2018年3月27日 11:03 > *收件人:* user@cassandra.apache.org > *主题:* Re: A node down every day in a 6 nodes cluster > > > > That warning isn’t sufficient to understand why the node is going down > > > > > > Cassandra 3.9 has some pretty serious known issues - upgrading to 3.11.3 > is likely a good idea > > > > Are the nodes coming up on their own? Or are you restarting them? > > > > Paste the output of nodetool tpstats and nodetool cfstats > > > > > > > > -- > > Jeff Jirsa > > > > > On Mar 26, 2018, at 7:56 PM, Xiangfei Ni <xiangfei...@cm-dt.com> wrote: > > Hi Cassandra experts, > > I am facing an issue,a node downs every day in a 6 nodes cluster,the > cluster is just in one DC, > > Every node has 4C 16G,and the heap configuration is MAX_HEAP_SIZE=8192m > HEAP_NEWSIZE=512m,every node load about 200G data,the RF for the business > CF is 3,a node downs one time every day,the system.log shows below info: > > WARN [Native-Transport-Requests-19] 2018-03-26 18:53:17,128 > CassandraAuthorizer.java:101 - CassandraAuthorizer failed to authorize > #<User nev_tsp_sa> for <table nev_prod_tsp.latest_rt_alarm> > > ERROR [Native-Transport-Requests-19] 2018-03-26 18:53:17,129 > QueryMessage.java:128 - Unexpected error during query > > com.google.common.util.concurrent.UncheckedExecutionException: > java.lang.RuntimeException: > org.apache.cassandra.exceptions.ReadTimeoutException: > Operation timed out - received only 0 responses. > > at > com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2203) > ~[guava-18.0.jar:na] > > at com.google.common.cache.LocalCache.get(LocalCache.java:3937) > ~[guava-18.0.jar:na] > > at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3941) > ~[guava-18.0.jar:na] > > at > com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4824) > ~[guava-18.0.jar:na] > > at org.apache.cassandra.auth.AuthCache.get(AuthCache.java:108) > ~[apache-cassandra-3.9.jar:3.9] > > at > org.apache.cassandra.auth.PermissionsCache.getPermissions(PermissionsCache.java:45) > ~[apache-cassandra-3.9.jar:3.9] > > at > org.apache.cassandra.auth.AuthenticatedUser.getPermissions(AuthenticatedUser.java:104) > ~[apache-cassandra-3.9.jar:3.9] > > at > org.apache.cassandra.service.ClientState.authorize(ClientState.java:419) > ~[apache-cassandra-3.9.jar:3.9] > > at org.apache.cassandra.service.ClientState. > checkPermissionOnResourceChain(ClientState.java:352) > ~[apache-cassandra-3.9.jar:3.9] > > at > org.apache.cassandra.service.ClientState.ensureHasPermission(ClientState.java:329) > ~[apache-cassandra-3.9.jar:3.9] > > at > org.apache.cassandra.service.ClientState.hasAccess(ClientState.java:316) > ~[apache-cassandra-3.9.jar:3.9] > > at > org.apache.cassandra.service.ClientState.hasColumnFamilyAccess(ClientState.java:300) > ~[apache-cassandra-3.9.jar:3.9] > > at org.apache.cassandra.cql3.statements.ModificationStatement. > checkAccess(ModificationStatement.java:211) ~[apache-cassandra-3.9.jar:3. > 9] > > at > org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:185) > ~[apache-cassandra-3.9.jar:3.9] > > at > org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:219) > ~[apache-cassandra-3.9.jar:3.9] > > at > org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:204) > ~[apache-cassandra-3.9.jar:3.9] > > at > org.apache.cassandra.transport.messages.QueryMessage.execute(QueryMessage.java:115) > ~[apache-cassandra-3.9.jar:3.9] > > at > org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:513) > [apache-cassandra-3.9.jar:3.9] > > at > org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:407) > [apache-cassandra-3.9.jar:3.9] > > at io.netty.channel.SimpleChannelInboundHandler.channelRead( > SimpleChannelInboundHandler.java:105) [netty-all-4.0.39.Final.jar:4. > 0.39.Final] > > at io.netty.channel.AbstractChannelHandlerContext. > invokeChannelRead(AbstractChannelHandlerContext.java:366) > [netty-all-4.0.39.Final.jar:4.0.39.Final] > > at io.netty.channel.AbstractChannelHandlerContext.access$600( > AbstractChannelHandlerContext.java:35) [netty-all-4.0.39.Final.jar:4. > 0.39.Final] > > at io.netty.channel.AbstractChannelHandlerContext$7.run( > AbstractChannelHandlerContext.java:357) [netty-all-4.0.39.Final.jar:4. > 0.39.Final] > > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > [na:1.8.0_91] > > at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorServ > ice$FutureTask.run(AbstractLocalAwareExecutorService.java:164) > [apache-cassandra-3.9.jar:3.9] > > at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:109) > [apache-cassandra-3.9.jar:3.9] > > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_91] > > Caused by: java.lang.RuntimeException: > org.apache.cassandra.exceptions.ReadTimeoutException: > Operation timed out - received only 0 responses. > > at > org.apache.cassandra.auth.CassandraAuthorizer.authorize(CassandraAuthorizer.java:102) > ~[apache-cassandra-3.9.jar:3.9] > > at > org.apache.cassandra.auth.PermissionsCache.lambda$new$0(PermissionsCache.java:37) > ~[apache-cassandra-3.9.jar:3.9] > > at org.apache.cassandra.auth.AuthCache$1.load(AuthCache.java:183) > ~[apache-cassandra-3.9.jar:3.9] > > at com.google.common.cache.LocalCache$LoadingValueReference. > loadFuture(LocalCache.java:3527) ~[guava-18.0.jar:na] > > at > com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2319) > ~[guava-18.0.jar:na] > > at com.google.common.cache.LocalCache$Segment. > lockedGetOrLoad(LocalCache.java:2282) ~[guava-18.0.jar:na] > > at > com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2197) > ~[guava-18.0.jar:na] > > ... 26 common frames omitted > > Caused by: org.apache.cassandra.exceptions.ReadTimeoutException: > Operation timed out - received only 0 responses. > > at > org.apache.cassandra.service.ReadCallback.awaitResults(ReadCallback.java:132) > ~[apache-cassandra-3.9.jar:3.9] > > at > org.apache.cassandra.service.ReadCallback.get(ReadCallback.java:137) > ~[apache-cassandra-3.9.jar:3.9] > > at > org.apache.cassandra.service.AbstractReadExecutor.get(AbstractReadExecutor.java:145) > ~[apache-cassandra-3.9.jar:3.9] > > at org.apache.cassandra.service.StorageProxy$ > SinglePartitionReadLifecycle.awaitResultsAndRetryOnDigestMismatch(StorageProxy.java:1718) > ~[apache-cassandra-3.9.jar:3.9] > > at > org.apache.cassandra.service.StorageProxy.fetchRows(StorageProxy.java:1667) > ~[apache-cassandra-3.9.jar:3.9] > > at > org.apache.cassandra.service.StorageProxy.readRegular(StorageProxy.java:1608) > ~[apache-cassandra-3.9.jar:3.9] > > at > org.apache.cassandra.service.StorageProxy.read(StorageProxy.java:1527) > ~[apache-cassandra-3.9.jar:3.9] > > at org.apache.cassandra.db.SinglePartitionReadCommand$ > Group.execute(SinglePartitionReadCommand.java:975) > ~[apache-cassandra-3.9.jar:3.9] > > at org.apache.cassandra.cql3.statements.SelectStatement. > execute(SelectStatement.java:271) ~[apache-cassandra-3.9.jar:3.9] > > at org.apache.cassandra.cql3.statements.SelectStatement. > execute(SelectStatement.java:232) ~[apache-cassandra-3.9.jar:3.9] > > at org.apache.cassandra.auth.CassandraAuthorizer. > addPermissionsForRole(CassandraAuthorizer.java:227) > ~[apache-cassandra-3.9.jar:3.9] > > at > org.apache.cassandra.auth.CassandraAuthorizer.authorize(CassandraAuthorizer.java:93) > ~[apache-cassandra-3.9.jar:3.9] > > ... 32 common frames omitted > > WARN [Native-Transport-Requests-23] 2018-03-26 18:53:17,131 > CassandraAuthorizer.java:101 - CassandraAuthorizer failed to authorize > #<User nev_tsp_sa> for <table nev_prod_tsp.rt_alarm_unite> > > ERROR [Native-Transport-Requests-64] 2018-03-26 18:53:17,135 > QueryMessage.java:128 - Unexpected error during query > > com.google.common.util.concurrent.UncheckedExecutionException: > java.lang.RuntimeException: > org.apache.cassandra.exceptions.ReadTimeoutException: > Operation timed out - received only 0 responses. > > at > com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2203) > ~[guava-18.0.jar:na] > > > > I have confirmed that nev_tsp_sa has all rights on nev_prod_tsp keyspace: > > cassandra@cqlsh:system_auth> select * from role_permissions where role = > 'nev_tsp_sa'; > > > > role | resource | permissions > > ------------+-------------------+--------------------------- > ----------------------------------- > > nev_tsp_sa | data/nev_prod_tsp | {'ALTER', 'AUTHORIZE', 'CREATE', 'DROP', > 'MODIFY', 'SELECT'} > > > > the cache disk can be read/write as normal. > > > > Highly appreciated if anyone can help,thanks very much ! > > > > > > Best Regards, > > > > 倪项菲*/ **David Ni* > > 中移德电网络科技有限公司 > > Virtue Intelligent Network Ltd, co. > > Add: 2003,20F No.35 Luojia creative city,Luoyu Road,Wuhan,HuBei > > Mob: +86 13797007811 <+86%20137%209700%207811>|Tel: + 86 27 5024 2516 > <+86%2027%205024%202516> > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org > For additional commands, e-mail: user-h...@cassandra.apache.org >