Raintung Li created SOLR-6056:
---------------------------------
Summary: Zookeeper crash JVM stack OOM because of recover strategy
Key: SOLR-6056
URL: https://issues.apache.org/jira/browse/SOLR-6056
Project: Solr
Issue Type: Bug
Components: SolrCloud
Affects Versions: 4.6
Environment: Two linux server, 65G, 16 core cup
20 collections, every collection has one shard two replica
one zookeeper
Reporter: Raintung Li
Some errors"org.apache.solr.common.SolrException: Error opening new searcher.
exceeded limit of maxWarmingSearchers=2, try again later", that occur
distributedupdateprocessor trig the core admin recover process.
That means every update request will send the core admin recover request.
(see the code DistributedUpdateProcessor.java doFinish())
The terrible thing is CoreAdminHandler will start a new thread to publish the
recover status and start recovery. Threads increase very quickly, and stack OOM
, Overseer can't handle a lot of status update , zookeeper node for
/overseer/queue/qn-0000125553 increase more than 40 thousand in two minutes.
At the last zookeeper crash.
The worse thing is queue has to much nodes in the zookeeper, the cluster can't
publish the right status because only one overseer work, I have to start three
threads to clear the queue nodes. The cluster doesn't work normal near 30
minutes...
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]