[ 
https://issues.apache.org/jira/browse/SOLR-17972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18046412#comment-18046412
 ] 

Kevin Risden commented on SOLR-17972:
-------------------------------------

Houston or I had removed some of the retryOnConnLoss with curator but it was 
pretty invasive - I tried to move it to a separate commit to not lose the work 
https://github.com/apache/solr/pull/2004/changes/80595c2c78c081295187f7f2226d91396a094fd3

Its definitely out of date and probably not super useful now but the idea was 
to follow up and remove it.

> DistributedMultiLock can fail to release some locks if ZK connection loss 
> occurs
> --------------------------------------------------------------------------------
>
>                 Key: SOLR-17972
>                 URL: https://issues.apache.org/jira/browse/SOLR-17972
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Pierre Salagnac
>            Priority: Minor
>
> This bug occurs only when run the cluster with distributed cluster processing 
> (no overseer).
> If a Zookeeper connection loss occurs when creating one of the locks of a 
> {{DistributedMultiLock}}, any other locks of the same multi-lock that were 
> already created will not be released., This will prevent the non releases 
> lock to be acquired again by other operations until the session is lost, 
> causing removal of the ephemeral node.
> Additionally, cluster maintenance operations will wait forever to acquire the 
> required locks, and consume thread from the node pool for distributed cluster 
> operations. Eventually, all thread will be used future operation will be 
> rejected because the queue is full.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to