[ 
https://issues.apache.org/jira/browse/SOLR-17972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18050057#comment-18050057
 ] 

ASF subversion and git services commented on SOLR-17972:
--------------------------------------------------------

Commit 0a0490092097c761ad5cb1931b484a09dbd22a8b in solr's branch 
refs/heads/branch_9_10 from Pierre Salagnac
[ https://gitbox.apache.org/repos/asf?p=solr.git;h=0a049009209 ]

SOLR-17972: Retry creation of ZK lock on connection loss. (#3968)

This makes sure we don't skip creation of ZK distributed lock in case of a 
transient connection loss. This fix is only for when Solr is running with no 
overseer (distributed updates).
This change is for 9x branch only, as this parameter is now ignored in 10 with 
the move to Curator.

(cherry picked from commit 900e0724c6fe6467490d3c1aae150bdd309f200c)


> DistributedMultiLock can fail to release some locks if ZK connection loss 
> occurs
> --------------------------------------------------------------------------------
>
>                 Key: SOLR-17972
>                 URL: https://issues.apache.org/jira/browse/SOLR-17972
>             Project: Solr
>          Issue Type: Bug
>    Affects Versions: 9.0
>            Reporter: Pierre Salagnac
>            Assignee: Pierre Salagnac
>            Priority: Minor
>              Labels: pull-request-available
>             Fix For: 9.11
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> This bug occurs only when run the cluster with distributed cluster processing 
> (no overseer).
> If a Zookeeper connection loss occurs when creating one of the locks of a 
> {{DistributedMultiLock}}, any other locks of the same multi-lock that were 
> already created will not be released., This will prevent the non releases 
> lock to be acquired again by other operations until the session is lost, 
> causing removal of the ephemeral node.
> Additionally, cluster maintenance operations will wait forever to acquire the 
> required locks, and consume thread from the node pool for distributed cluster 
> operations. Eventually, all thread will be used future operation will be 
> rejected because the queue is full.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to