[
https://issues.apache.org/jira/browse/SOLR-17972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18046410#comment-18046410
]
Pierre Salagnac commented on SOLR-17972:
----------------------------------------
I was looking to contribute a fix for this, and I realized parameter
{{retryOnConnLoss}} is now ignored in Solr 10, since connections to Zookeeper
are managed with Curator (SOLR-16116). My understanding is it now behaves like
if {{retryOnConnLoss}} would always be {{true}}.
I can't reproduce it locally, so I can't double check, but this bug probably
cannot be hit in 10. I will still open a PR in 9x.
Side note: this parameters should be clean-up up the code base.
> DistributedMultiLock can fail to release some locks if ZK connection loss
> occurs
> --------------------------------------------------------------------------------
>
> Key: SOLR-17972
> URL: https://issues.apache.org/jira/browse/SOLR-17972
> Project: Solr
> Issue Type: Bug
> Reporter: Pierre Salagnac
> Priority: Minor
>
> This bug occurs only when run the cluster with distributed cluster processing
> (no overseer).
> If a Zookeeper connection loss occurs when creating one of the locks of a
> {{DistributedMultiLock}}, any other locks of the same multi-lock that were
> already created will not be released., This will prevent the non releases
> lock to be acquired again by other operations until the session is lost,
> causing removal of the ephemeral node.
> Additionally, cluster maintenance operations will wait forever to acquire the
> required locks, and consume thread from the node pool for distributed cluster
> operations. Eventually, all thread will be used future operation will be
> rejected because the queue is full.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]