[image: Screenshot from 2023-01-18 19-06-33.png]

Restart pattern is above.

On Wed, Jan 18, 2023 at 2:43 PM Rohit Walecha <rohi...@fnp.com> wrote:

> Hi,
>
> We have a 3 node *solr(8.8.0)* cluster deployed on multiple environments
> which is connected to a 3 node *zookeeper(3.6.2)* cluster And, we have
> been facing frequent restarts of solr cloud nodes since the last few
> months..tried to debug this and while looking into the logs and other stats
> we have been seeing that the node which has restarted says :
>
> *1. *
> 2023-01-04 21:50:09.186 WARN (zkConnectionManagerCallback-15-thread-1) [ ]
> o.a.s.c.c.ConnectionManager Watcher
> org.apache.solr.common.cloud.ConnectionManager@731cf36d name:
> ZooKeeperConnection
> Watcher:apache-solrcloud-zookeeper-0.apache-solrcloud-zookeeper-headless.production.svc.cluster.local:2181,apache-solrcloud-zookeeper-1.apache-solrcloud-zookeeper-headless.production.svc.cluster.local:2181,apache-solrcloud-zookeeper-2.apache-solrcloud-zookeeper-headless.production.svc.cluster.local:2181/
> got event WatchedEvent state:Disconnected type:None path:null path: null
> type: None
> which probably says *event state is either disconnected or expired*, and
> says following as a warning :
> WARN (zkConnectionManagerCallback-13-thread-1) [ ]
> o.a.s.c.c.ConnectionManager zkClient has disconnected
>
>
>
> *2*.
> Client session timed out, have not heard from server in 30018ms for
> sessionid 0x1000091fcbe0001 A session timeout from ZkClient inside Solr.
> *And 3.* 2023-01-04 21:50:10.685 INFO (ShutdownMonitor) [ ]
> o.a.s.c.ZkController Publish this node as DOWN... 2023-01-04 21:50:10.685
> INFO (ShutdownMonitor) [ ] o.a.s.c.ZkController Publish
> node=apache-solrcloud-0.apache-solrcloud-headless.production:8983_solr as
> DOWN
> Attached *050120223-solr-cloud-0.log*
>
>
>
> *Meanwhile zookeeper node says following the time at which solr node gets
> restarted : *
>
> 2023-01-15 07:11:44,349 [myid:2] - WARN  
> [NIOWorkerThread-2:ZooKeeperServer@1384] - Connection request from old client 
> /10.70.26.0:54584; will be dropped if server is in r-o mode
> 2023-01-15 07:11:44,350 [myid:2] - INFO  
> [CommitProcessor:2:LearnerSessionTracker@116] - Committing global session 
> 0x200042f19cf130f
> 2023-01-15 07:11:44,352 [myid:2] - INFO  
> [RequestThrottler:QuorumZooKeeperServer@159] - Submitting global closeSession 
> request for session 0x200042f19cf130f
>
>
> Now we are at a point where *we know that when the solr node is getting 
> restarted, who is is pushed down the node and as we can see in the logs at 
> [#2]* which says something like Client session timed out and it is a session 
> which is getting established between solr node and zookeeper also  while 
> debugging this issue we have went through a series of issues reported in the 
> current version of *zookeeper *we are using which in gist says about slower 
> leader election and zookeeper nodes getting restarted and the whole zookeeper 
> cluster going down while a leader is getting unhealthy/stopped/restarted and 
> leader election happening again which is taking a long time which leads to 
> client sessions are getting timed out during that period of time.
>
> We have tried to replicate the same on the local env by setting up a solr and 
> zookeeper cluster by forcefully restarting/stopping leader zookeeper nodes 
> and we have got something like : *have-not-heard-back-local-cluster.log *and 
> We could replicate [#2].
>
> Seeking help here..to find out what could be the possible reason for these 
> frequent restarts of solr cloud nodes.
> *Regards.
> *
>
>

Reply via email to