[
https://issues.apache.org/jira/browse/SOLR-17972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18031902#comment-18031902
]
Pierre Salagnac commented on SOLR-17972:
----------------------------------------
Ephemeral nodes are auto-removed after session expiration. I'm not sure how
Zookeeper handle this under the hood, but my understanding is we may have
connection loss and connection reestablishment without losing the session.
We have a one line change to retry lock creation on connection loss that should
solve this issue 99% of times. Will try to open a PR this week.
> DistributedMultiLock can fail to release some locks if ZK connection loss
> occurs
> --------------------------------------------------------------------------------
>
> Key: SOLR-17972
> URL: https://issues.apache.org/jira/browse/SOLR-17972
> Project: Solr
> Issue Type: Bug
> Reporter: Pierre Salagnac
> Priority: Minor
>
> This bug occurs only when run the cluster with distributed cluster processing
> (no overseer).
> If a Zookeeper connection loss occurs when creating one of the locks of a
> {{DistributedMultiLock}}, any other locks of the same multi-lock that were
> already created will not be released., This will prevent the non releases
> lock to be acquired again by other operations until the session is lost,
> causing removal of the ephemeral node.
> Additionally, cluster maintenance operations will wait forever to acquire the
> required locks, and consume thread from the node pool for distributed cluster
> operations. Eventually, all thread will be used future operation will be
> rejected because the queue is full.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]