We are also using solr-operator in kubernetes.

On Mon, Jan 23, 2023 at 12:28 PM Rohit Walecha <rohi...@fnp.com> wrote:

> Yes, Vincenzo it is deployed in kubernetes...any suggestions?
>
> On Mon, Jan 23, 2023 at 12:22 PM Rohit Walecha <rohi...@fnp.com> wrote:
>
>> Hi Shawn,
>>
>> Will try applying the changes..you have suggested..and get back on this.
>>
>> On Thu, Jan 19, 2023 at 4:08 PM Rohit Walecha <rohi...@fnp.com> wrote:
>>
>>> We have multiple collections inside our cluster(3 node), but we have
>>> some collections having replication factor 1 and some collections having
>>> replication factor 2..should this be impacting our nodes..and sending them
>>> in recovery state..and restart !!
>>>
>>> On Wed, Jan 18, 2023 at 7:07 PM Rohit Walecha <rohi...@fnp.com> wrote:
>>>
>>>> [image: Screenshot from 2023-01-18 19-06-33.png]
>>>>
>>>> Restart pattern is above.
>>>>
>>>> On Wed, Jan 18, 2023 at 2:43 PM Rohit Walecha <rohi...@fnp.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> We have a 3 node *solr(8.8.0)* cluster deployed on multiple
>>>>> environments which is connected to a 3 node *zookeeper(3.6.2)*
>>>>> cluster And, we have been facing frequent restarts of solr cloud nodes
>>>>> since the last few months..tried to debug this and while looking into the
>>>>> logs and other stats we have been seeing that the node which has restarted
>>>>> says :
>>>>>
>>>>> *1. *
>>>>> 2023-01-04 21:50:09.186 WARN (zkConnectionManagerCallback-15-thread-1)
>>>>> [ ] o.a.s.c.c.ConnectionManager Watcher
>>>>> org.apache.solr.common.cloud.ConnectionManager@731cf36d name:
>>>>> ZooKeeperConnection
>>>>> Watcher:apache-solrcloud-zookeeper-0.apache-solrcloud-zookeeper-headless.production.svc.cluster.local:2181,apache-solrcloud-zookeeper-1.apache-solrcloud-zookeeper-headless.production.svc.cluster.local:2181,apache-solrcloud-zookeeper-2.apache-solrcloud-zookeeper-headless.production.svc.cluster.local:2181/
>>>>> got event WatchedEvent state:Disconnected type:None path:null path: null
>>>>> type: None
>>>>> which probably says *event state is either disconnected or expired*,
>>>>> and says following as a warning :
>>>>> WARN (zkConnectionManagerCallback-13-thread-1) [ ]
>>>>> o.a.s.c.c.ConnectionManager zkClient has disconnected
>>>>>
>>>>>
>>>>>
>>>>> *2*.
>>>>> Client session timed out, have not heard from server in 30018ms for
>>>>> sessionid 0x1000091fcbe0001 A session timeout from ZkClient inside Solr.
>>>>> *And 3.* 2023-01-04 21:50:10.685 INFO (ShutdownMonitor) [ ]
>>>>> o.a.s.c.ZkController Publish this node as DOWN... 2023-01-04
>>>>> 21:50:10.685 INFO (ShutdownMonitor) [ ] o.a.s.c.ZkController Publish
>>>>> node=apache-solrcloud-0.apache-solrcloud-headless.production:8983_solr as
>>>>> DOWN
>>>>> Attached *050120223-solr-cloud-0.log*
>>>>>
>>>>>
>>>>>
>>>>> *Meanwhile zookeeper node says following the time at which solr node
>>>>> gets restarted : *
>>>>>
>>>>> 2023-01-15 07:11:44,349 [myid:2] - WARN  
>>>>> [NIOWorkerThread-2:ZooKeeperServer@1384] - Connection request from old 
>>>>> client /10.70.26.0:54584; will be dropped if server is in r-o mode
>>>>> 2023-01-15 07:11:44,350 [myid:2] - INFO  
>>>>> [CommitProcessor:2:LearnerSessionTracker@116] - Committing global session 
>>>>> 0x200042f19cf130f
>>>>> 2023-01-15 07:11:44,352 [myid:2] - INFO  
>>>>> [RequestThrottler:QuorumZooKeeperServer@159] - Submitting global 
>>>>> closeSession request for session 0x200042f19cf130f
>>>>>
>>>>>
>>>>> Now we are at a point where *we know that when the solr node is getting 
>>>>> restarted, who is is pushed down the node and as we can see in the logs 
>>>>> at [#2]* which says something like Client session timed out and it is a 
>>>>> session which is getting established between solr node and zookeeper also 
>>>>>  while debugging this issue we have went through a series of issues 
>>>>> reported in the current version of *zookeeper *we are using which in gist 
>>>>> says about slower leader election and zookeeper nodes getting restarted 
>>>>> and the whole zookeeper cluster going down while a leader is getting 
>>>>> unhealthy/stopped/restarted and leader election happening again which is 
>>>>> taking a long time which leads to client sessions are getting timed out 
>>>>> during that period of time.
>>>>>
>>>>> We have tried to replicate the same on the local env by setting up a solr 
>>>>> and zookeeper cluster by forcefully restarting/stopping leader zookeeper 
>>>>> nodes and we have got something like : 
>>>>> *have-not-heard-back-local-cluster.log *and We could replicate [#2].
>>>>>
>>>>> Seeking help here..to find out what could be the possible reason for 
>>>>> these frequent restarts of solr cloud nodes.
>>>>> *Regards.
>>>>> *
>>>>>
>>>>>

Reply via email to