[ https://issues.apache.org/jira/browse/SOLR-17720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17946571#comment-17946571 ]
ASF subversion and git services commented on SOLR-17720: -------------------------------------------------------- Commit eb3bafab83b4f3c4801835f8d97740992f6ca93d in solr's branch refs/heads/branch_9x from aparnasuresh85 [ https://gitbox.apache.org/repos/asf?p=solr.git;h=eb3bafab83b ] SOLR-17720: Fix rare deadlock in CollectionProperties (#3304) The problem pre-dated CollectionPropertiesZkStateReader's existence. (cherry picked from commit 67a642fe0263588155627c0429ea5cf39f519c8e) > Deadlock in CollectionPropertiesZkStateReader > --------------------------------------------- > > Key: SOLR-17720 > URL: https://issues.apache.org/jira/browse/SOLR-17720 > Project: Solr > Issue Type: Bug > Components: SolrJ > Affects Versions: 9.7 > Reporter: Houston Putman > Priority: Blocker > Labels: pull-request-available > Fix For: 9.9 > > Time Spent: 40m > Remaining Estimate: 0h > > {{CollectionPropertiesZkStateReader}} has multiple different mechanisms for > synchronizing when modifying its concurrent data structures. > # {{synchronized (getCollectionLock(collection))}} > # {{collectionPropsObservers}} is a ConcurrentHashMap, and therefore locks > on updating a single key within the map. > Unfortunately this can cause a deadlock. > In {{CollectionPropertiesZkStateReader.removeCollectionPropsWatcher()}}, > {{collectionPropsObservers.compute(collection, <function>)}} is used which > will create a lock in {{collectionPropsObservers}} on the {{collection}} key. > Within this locked {{<function>}} command, {{synchronized > (getCollectionLock(collection))}} is called. > In {{CollectionPropertiesZkStateReader.refreshAndWatch()}}, {{synchronized > (getCollectionLock(coll))}} is used for the whole method. And within this > synchronized block, {{collectionPropsObservers.remove(coll)}} is called > (which will obviously get a lock on the {{coll}} key for > {{collectionPropsObservers}}. > So {{CollectionPropertiesZkStateReader.removeCollectionPropsWatcher()}} has > the lock for {{collectionPropsObservers}} but is waiting on the lock for > {{getCollectionLock(coll)}}. And > {{CollectionPropertiesZkStateReader.refreshAndWatch()}} has the lock for > {{getCollectionLock(coll)}} and is waiting on the lock for > {{collectionPropsObservers}}. Hence deadlock. > This code is quite complex, and I think it can really be simplified, but > that's just a gut reaction. I think moving the {{synchronized > (getCollectionLock(collection))}} block in {{removeCollectionPropsWatcher()}} > outside of the {{compute()}} call would solve this one deadlock though. > Hopefully we can really simplify this with Curator though. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org