[ https://issues.apache.org/jira/browse/SOLR-17720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17938704#comment-17938704 ]
David Smiley commented on SOLR-17720: ------------------------------------- Fascinating. Did a test (which?) reveal this? > Deadlock in CollectionPropertiesZkStateReader > --------------------------------------------- > > Key: SOLR-17720 > URL: https://issues.apache.org/jira/browse/SOLR-17720 > Project: Solr > Issue Type: Bug > Components: SolrJ > Affects Versions: 9.7 > Reporter: Houston Putman > Priority: Blocker > Fix For: 9.9 > > > {{CollectionPropertiesZkStateReader}} has multiple different mechanisms for > synchronizing when modifying its concurrent data structures. > # {{synchronized (getCollectionLock(collection))}} > # {{collectionPropsObservers}} is a ConcurrentHashMap, and therefore locks > on updating a single key within the map. > Unfortunately this can cause a deadlock. > In {{CollectionPropertiesZkStateReader.removeCollectionPropsWatcher()}}, > {{collectionPropsObservers.compute(collection, <function>)}} is used which > will create a lock in {{collectionPropsObservers}} on the {{collection}} key. > Within this locked {{<function>}} command, {{synchronized > (getCollectionLock(collection))}} is called. > In {{CollectionPropertiesZkStateReader.refreshAndWatch()}}, {{synchronized > (getCollectionLock(coll))}} is used for the whole method. And within this > synchronized block, {{collectionPropsObservers.remove(coll)}} is called > (which will obviously get a lock on the {{coll}} key for > {{collectionPropsObservers}}. > So {{CollectionPropertiesZkStateReader.removeCollectionPropsWatcher()}} has > the lock for {{collectionPropsObservers}} but is waiting on the lock for > {{getCollectionLock(coll)}}. And > {{CollectionPropertiesZkStateReader.refreshAndWatch()}} has the lock for > {{getCollectionLock(coll)}} and is waiting on the lock for > {{collectionPropsObservers}}. Hence deadlock. > This code is quite complex, and I think it can really be simplified, but > that's just a gut reaction. I think moving the {{synchronized > (getCollectionLock(collection))}} block in {{removeCollectionPropsWatcher()}} > outside of the {{compute()}} call would solve this one deadlock though. > Hopefully we can really simplify this with Curator though. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org