Hi everyone,

We have Windows 2022 system running Solr 9.1.1 that everyday to stop at fixed 
time 19:00 to backup, backup normally cost 10minutes. then at 20:00, start the 
application with solr.
solr installed with nssm.exe run as windows service. Start solr batch file use 
command "net start solr-svc".
Solr run on sngle machine and use cloud mode, it has "EMBEDDED STANDALONE 
ZOOKEEPER SERVER at port 9983". For every 3-4 days, Solr start failed due to 
error "o.a.s.c.SolrCore null => org.apache.solr.common.SolrException: 
java.util.concurrent.TimeoutException: Could not connect to ZooKeeper 
localhost:9983 within 30000 ms", when this error appears, about 10 minutes 
later, if system admin run same command "net start solr-svc", solr will start 
properly most of time. (if still failed, then , just wait 10mins and try to 
start solr service again.)

We checked:
[1] System has 16G physical memory, the object in Solr is around 10000, when 
the above issue happen, system has about 8G free memory, Solr has enough memory 
[2] free disk space is 850G

What is possible causes and countermeasures?
I'm appreciate any though/suggestion might have about this.

2024-11-17 20:00:07.691 DEBUG (main) [] o.e.j.u.c.AbstractLifeCycle STARTED 
@6108ms ScheduledExecutorScheduler@15f35bc3{STARTED}
2024-11-17 20:00:07.691 DEBUG (main) [] o.e.j.u.c.AbstractLifeCycle starting 
ClientSelectorManager@16a5eb6d{STOPPED}
2024-11-17 20:00:07.696 DEBUG (main) [] o.e.j.u.c.ContainerLifeCycle 
EatWhatYouKill@31120021/SelectorProducer@2740e316/IDLE/p=false/NoTryExecutor@5b5a4aed[org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor@138aa3cc[Running,
 pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 
0]][pc=0,pic=0,pec=0,epc=0]@2024-11-17T20:00:07.6961328Z added 
{SelectorProducer@2740e316,POJO}
2024-11-17 20:00:07.847 INFO  (main) [] o.a.s.c.SolrZkServerProps Reading 
configuration from: D:\MyApplication\Solr\server\solr\zoo.cfg
2024-11-17 20:00:07.850 INFO  (main) [] o.a.s.c.SolrZkServer STARTING EMBEDDED 
STANDALONE ZOOKEEPER SERVER at port 9983
2024-11-17 20:00:07.850 WARN  (main) [] o.a.s.c.SolrZkServer Embedded Zookeeper 
is not recommended in production environments. See Reference Guide for details.
2024-11-17 20:00:08.350 INFO  (main) [] o.a.s.c.ZkContainer Zookeeper 
client=localhost:9983
2024-11-17 20:00:08.372 INFO  (main) [] o.a.s.c.DistributedClusterStateUpdater 
Creating DistributedClusterStateUpdater with useDistributedStateUpdate=false. 
Solr will be using Overseer based cluster state updates.
2024-11-17 20:00:08.379 DEBUG (main) [] o.a.s.c.c.ZkClientConnectionStrategy 
Attempting to load zk connection strategy 'null'
2024-11-17 20:00:08.381 DEBUG (main) [] o.a.s.c.ZkController Added new 
OnReconnect listener 
org.apache.solr.cloud.ZkController$$Lambda$321/0x00000001004f2c40@5d512ddb
2024-11-17 20:00:25.983 WARN  (embeddedZkServer) [] o.a.z.s.ServerCnxnFactory 
maxCnxns is not configured, using default value 0.
2024-11-17 20:00:26.481 INFO  (main) [] o.a.s.c.c.ConnectionManager Waiting up 
to 30000ms for client to connect to ZooKeeper
2024-11-17 20:00:37.521 WARN  (main-SendThread(localhost:9983)) [] 
o.a.z.ClientCnxn Session 0x0 for server localhost/0:0:0:0:0:0:0:1:9983, Closing 
socket connection. Attempting reconnect except it is a SessionExpiredException. 
=> java.net.ConnectException: Connection refused: no further information
        at java.base/sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
java.net.ConnectException: Connection refused: no further information
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:?]
        at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:779) ~[?:?]
        at 
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:344)
 ~[?:?]
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1282) 
~[?:?]
2024-11-17 20:00:59.084 DEBUG (main-EventThread) [] o.a.s.c.c.SolrZkClient 
Submitting job to respond to event WatchedEvent state:Closed type:None path:null
2024-11-17 20:00:59.085 ERROR (main-EventThread) [] o.a.z.ClientCnxn Error 
while calling watcher. => java.util.concurrent.RejectedExecutionException: Task 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$$Lambda$299/0x000000010044c440@12196d68
 rejected from 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor@571be14f[Terminated,
 pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0]
        at 
java.base/java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2055)
java.util.concurrent.RejectedExecutionException: Task 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$$Lambda$299/0x000000010044c440@12196d68
 rejected from 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor@571be14f[Terminated,
 pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0]
        at 
java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2055)
 ~[?:?]
        at 
java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:825) 
~[?:?]
        at 
java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1355) 
~[?:?]
        at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.execute(ExecutorUtil.java:252)
 ~[?:?]
        at 
java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:118)
 ~[?:?]
        at 
org.apache.solr.common.cloud.SolrZkClient$ProcessWatchWithExecutor.process(SolrZkClient.java:1019)
 ~[?:?]
        at 
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:578) 
~[?:?]
        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:553) 
~[?:?]
2024-11-17 20:00:59.087 ERROR (main) [] o.a.s.s.CoreContainerProvider Could not 
start Solr. Check solr/home property and the logs
2024-11-17 20:00:59.100 ERROR (main) [] o.a.s.c.SolrCore null => 
org.apache.solr.common.SolrException: java.util.concurrent.TimeoutException: 
Could not connect to ZooKeeper localhost:9983 within 30000 ms
        at 
org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:225)
org.apache.solr.common.SolrException: java.util.concurrent.TimeoutException: 
Could not connect to ZooKeeper localhost:9983 within 30000 ms
        at 
org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:225) ~[?:?]
...
        at org.eclipse.jetty.start.Main.main(Main.java:77) 
~[start.jar:9.4.48.v20220622] Caused by: java.util.concurrent.TimeoutException: 
Could not connect to ZooKeeper localhost:9983 within 30000 ms
        at 
org.apache.solr.common.cloud.ConnectionManager.waitForConnected(ConnectionManager.java:297)
 ~[?:?]
        at 
org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:216) ~[?:?]
        ... 54 more
2024-11-17 20:00:59.102 DEBUG (main) [] o.e.j.u.c.AbstractLifeCycle starting 
SolrRequestFilter==org.apache.solr.servlet.SolrDispatchFilter@162c1dfb{inst=false,async=false,src=DESCRIPTOR:file:///D:/MyApplication/Solr/server/solr-webapp/webapp/WEB-INF/web.xml}
2024-11-17 20:00:59.106 ERROR (main) [] o.a.s.c.SolrCore null => 
javax.servlet.UnavailableException: Error processing the request. CoreContainer 
is either not initialized or shutting down.
        at 
org.apache.solr.servlet.CoreContainerProvider.waitForCoreContainer(CoreContainerProvider.java:150)
javax.servlet.UnavailableException: Error processing the request. CoreContainer 
is either not initialized or shutting down.

Reply via email to