[jira] [Commented] (SOLR-17972) DistributedMultiLock can fail to release some locks if ZK connection loss occurs

ASF subversion and git services (Jira) Mon, 29 Dec 2025 01:55:09 -0800


    [ 
https://issues.apache.org/jira/browse/SOLR-17972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18048015#comment-18048015
 ]


ASF subversion and git services commented on SOLR-17972:
--------------------------------------------------------

Commit 900e0724c6fe6467490d3c1aae150bdd309f200c in solr's branch 
refs/heads/branch_9x from Pierre Salagnac
[ https://gitbox.apache.org/repos/asf?p=solr.git;h=900e0724c6f ]

SOLR-17972: Retry creation of ZK lock on connection loss. (#3968)

This makes sure we don't skip creation of ZK distributed lock in case of a 
transient connection loss. This fix is only for when Solr is running with no 
overseer (distributed updates).
This change is for 9x branch only, as this parameter is now ignored in 10 with 
the move to Curator.

> DistributedMultiLock can fail to release some locks if ZK connection loss 
> occurs
> --------------------------------------------------------------------------------
>
>                 Key: SOLR-17972
>                 URL: https://issues.apache.org/jira/browse/SOLR-17972
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Pierre Salagnac
>            Assignee: Pierre Salagnac
>            Priority: Minor
>              Labels: pull-request-available
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> This bug occurs only when run the cluster with distributed cluster processing 
> (no overseer).
> If a Zookeeper connection loss occurs when creating one of the locks of a 
> {{DistributedMultiLock}}, any other locks of the same multi-lock that were 
> already created will not be released., This will prevent the non releases 
> lock to be acquired again by other operations until the session is lost, 
> causing removal of the ephemeral node.
> Additionally, cluster maintenance operations will wait forever to acquire the 
> required locks, and consume thread from the node pool for distributed cluster 
> operations. Eventually, all thread will be used future operation will be 
> rejected because the queue is full.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-17972) DistributedMultiLock can fail to release some locks if ZK connection loss occurs

Reply via email to