[jira] [Commented] (SOLR-17972) DistributedMultiLock can fail to release some locks if ZK connection loss occurs

Pierre Salagnac (Jira) Thu, 18 Dec 2025 10:12:19 -0800


    [ 
https://issues.apache.org/jira/browse/SOLR-17972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18046410#comment-18046410
 ]


Pierre Salagnac commented on SOLR-17972:
----------------------------------------

I was looking to contribute a fix for this, and I realized parameter 
{{retryOnConnLoss}} is now ignored in Solr 10, since connections to Zookeeper 
are managed with Curator (SOLR-16116). My understanding is it now behaves like 
if {{retryOnConnLoss}} would always be {{true}}.

I can't reproduce it locally, so I can't double check, but this bug probably 
cannot be hit in 10. I will still open a PR in 9x.

Side note: this parameters should be clean-up up the code base.

> DistributedMultiLock can fail to release some locks if ZK connection loss 
> occurs
> --------------------------------------------------------------------------------
>
>                 Key: SOLR-17972
>                 URL: https://issues.apache.org/jira/browse/SOLR-17972
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Pierre Salagnac
>            Priority: Minor
>
> This bug occurs only when run the cluster with distributed cluster processing 
> (no overseer).
> If a Zookeeper connection loss occurs when creating one of the locks of a 
> {{DistributedMultiLock}}, any other locks of the same multi-lock that were 
> already created will not be released., This will prevent the non releases 
> lock to be acquired again by other operations until the session is lost, 
> causing removal of the ephemeral node.
> Additionally, cluster maintenance operations will wait forever to acquire the 
> required locks, and consume thread from the node pool for distributed cluster 
> operations. Eventually, all thread will be used future operation will be 
> rejected because the queue is full.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-17972) DistributedMultiLock can fail to release some locks if ZK connection loss occurs

Reply via email to