Hi everyone, We have Windows 2022 system running Solr 9.1.1 that everyday to stop at fixed time 19:00 to backup, backup normally cost 10minutes. then at 20:00, start the application with solr. solr installed with nssm.exe run as windows service. Start solr batch file use command "net start solr-svc". Solr run on sngle machine and use cloud mode, it has "EMBEDDED STANDALONE ZOOKEEPER SERVER at port 9983". For every 3-4 days, Solr start failed due to error "o.a.s.c.SolrCore null => org.apache.solr.common.SolrException: java.util.concurrent.TimeoutException: Could not connect to ZooKeeper localhost:9983 within 30000 ms", when this error appears, about 10 minutes later, if system admin run same command "net start solr-svc", solr will start properly most of time. (if still failed, then , just wait 10mins and try to start solr service again.)
We checked: [1] System has 16G physical memory, the object in Solr is around 10000, when the above issue happen, system has about 8G free memory, Solr has enough memory [2] free disk space is 850G What is possible causes and countermeasures? I'm appreciate any though/suggestion might have about this. 2024-11-17 20:00:07.691 DEBUG (main) [] o.e.j.u.c.AbstractLifeCycle STARTED @6108ms ScheduledExecutorScheduler@15f35bc3{STARTED} 2024-11-17 20:00:07.691 DEBUG (main) [] o.e.j.u.c.AbstractLifeCycle starting ClientSelectorManager@16a5eb6d{STOPPED} 2024-11-17 20:00:07.696 DEBUG (main) [] o.e.j.u.c.ContainerLifeCycle EatWhatYouKill@31120021/SelectorProducer@2740e316/IDLE/p=false/NoTryExecutor@5b5a4aed[org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor@138aa3cc[Running, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0]][pc=0,pic=0,pec=0,epc=0]@2024-11-17T20:00:07.6961328Z added {SelectorProducer@2740e316,POJO} 2024-11-17 20:00:07.847 INFO (main) [] o.a.s.c.SolrZkServerProps Reading configuration from: D:\MyApplication\Solr\server\solr\zoo.cfg 2024-11-17 20:00:07.850 INFO (main) [] o.a.s.c.SolrZkServer STARTING EMBEDDED STANDALONE ZOOKEEPER SERVER at port 9983 2024-11-17 20:00:07.850 WARN (main) [] o.a.s.c.SolrZkServer Embedded Zookeeper is not recommended in production environments. See Reference Guide for details. 2024-11-17 20:00:08.350 INFO (main) [] o.a.s.c.ZkContainer Zookeeper client=localhost:9983 2024-11-17 20:00:08.372 INFO (main) [] o.a.s.c.DistributedClusterStateUpdater Creating DistributedClusterStateUpdater with useDistributedStateUpdate=false. Solr will be using Overseer based cluster state updates. 2024-11-17 20:00:08.379 DEBUG (main) [] o.a.s.c.c.ZkClientConnectionStrategy Attempting to load zk connection strategy 'null' 2024-11-17 20:00:08.381 DEBUG (main) [] o.a.s.c.ZkController Added new OnReconnect listener org.apache.solr.cloud.ZkController$$Lambda$321/0x00000001004f2c40@5d512ddb 2024-11-17 20:00:25.983 WARN (embeddedZkServer) [] o.a.z.s.ServerCnxnFactory maxCnxns is not configured, using default value 0. 2024-11-17 20:00:26.481 INFO (main) [] o.a.s.c.c.ConnectionManager Waiting up to 30000ms for client to connect to ZooKeeper 2024-11-17 20:00:37.521 WARN (main-SendThread(localhost:9983)) [] o.a.z.ClientCnxn Session 0x0 for server localhost/0:0:0:0:0:0:0:1:9983, Closing socket connection. Attempting reconnect except it is a SessionExpiredException. => java.net.ConnectException: Connection refused: no further information at java.base/sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) java.net.ConnectException: Connection refused: no further information at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:?] at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:779) ~[?:?] at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:344) ~[?:?] at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1282) ~[?:?] 2024-11-17 20:00:59.084 DEBUG (main-EventThread) [] o.a.s.c.c.SolrZkClient Submitting job to respond to event WatchedEvent state:Closed type:None path:null 2024-11-17 20:00:59.085 ERROR (main-EventThread) [] o.a.z.ClientCnxn Error while calling watcher. => java.util.concurrent.RejectedExecutionException: Task org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$$Lambda$299/0x000000010044c440@12196d68 rejected from org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor@571be14f[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0] at java.base/java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2055) java.util.concurrent.RejectedExecutionException: Task org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$$Lambda$299/0x000000010044c440@12196d68 rejected from org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor@571be14f[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0] at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2055) ~[?:?] at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:825) ~[?:?] at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1355) ~[?:?] at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.execute(ExecutorUtil.java:252) ~[?:?] at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:118) ~[?:?] at org.apache.solr.common.cloud.SolrZkClient$ProcessWatchWithExecutor.process(SolrZkClient.java:1019) ~[?:?] at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:578) ~[?:?] at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:553) ~[?:?] 2024-11-17 20:00:59.087 ERROR (main) [] o.a.s.s.CoreContainerProvider Could not start Solr. Check solr/home property and the logs 2024-11-17 20:00:59.100 ERROR (main) [] o.a.s.c.SolrCore null => org.apache.solr.common.SolrException: java.util.concurrent.TimeoutException: Could not connect to ZooKeeper localhost:9983 within 30000 ms at org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:225) org.apache.solr.common.SolrException: java.util.concurrent.TimeoutException: Could not connect to ZooKeeper localhost:9983 within 30000 ms at org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:225) ~[?:?] ... at org.eclipse.jetty.start.Main.main(Main.java:77) ~[start.jar:9.4.48.v20220622] Caused by: java.util.concurrent.TimeoutException: Could not connect to ZooKeeper localhost:9983 within 30000 ms at org.apache.solr.common.cloud.ConnectionManager.waitForConnected(ConnectionManager.java:297) ~[?:?] at org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:216) ~[?:?] ... 54 more 2024-11-17 20:00:59.102 DEBUG (main) [] o.e.j.u.c.AbstractLifeCycle starting SolrRequestFilter==org.apache.solr.servlet.SolrDispatchFilter@162c1dfb{inst=false,async=false,src=DESCRIPTOR:file:///D:/MyApplication/Solr/server/solr-webapp/webapp/WEB-INF/web.xml} 2024-11-17 20:00:59.106 ERROR (main) [] o.a.s.c.SolrCore null => javax.servlet.UnavailableException: Error processing the request. CoreContainer is either not initialized or shutting down. at org.apache.solr.servlet.CoreContainerProvider.waitForCoreContainer(CoreContainerProvider.java:150) javax.servlet.UnavailableException: Error processing the request. CoreContainer is either not initialized or shutting down.