Houston Putman created SOLR-17720:
-------------------------------------

             Summary: Deadlock in CollectionPropertiesZkStateReader
                 Key: SOLR-17720
                 URL: https://issues.apache.org/jira/browse/SOLR-17720
             Project: Solr
          Issue Type: Bug
          Components: SolrJ
    Affects Versions: 9.7
            Reporter: Houston Putman


{{CollectionPropertiesZkStateReader}} has multiple different mechanisms for 
synchronizing when modifying its concurrent data structures.
 # {{synchronized (getCollectionLock(collection))}} 
 # {{collectionPropsObservers}} is a ConcurrentHashMap, and therefore locks on 
updating a single key within the map.

Unfortunately this can cause a deadlock.

In {{CollectionPropertiesZkStateReader.removeCollectionPropsWatcher()}},  
{{collectionPropsObservers.compute(collection, <function>)}} is used which will 
create a lock in {{collectionPropsObservers}} on the {{collection}} key. Within 
this locked {{<function>}} command, {{synchronized 
(getCollectionLock(collection))}} is called.

In {{CollectionPropertiesZkStateReader.refreshAndWatch()}}, {{synchronized 
(getCollectionLock(coll))}} is used for the whole method. And within this 
synchronized block, {{collectionPropsObservers.remove(coll)}} is called (which 
will obviously get a lock on the {{coll}} key for {{collectionPropsObservers}}.

So {{CollectionPropertiesZkStateReader.removeCollectionPropsWatcher()}} has the 
lock for {{collectionPropsObservers}} but is waiting on the lock for 
{{getCollectionLock(coll)}}. And 
{{CollectionPropertiesZkStateReader.refreshAndWatch()}} has the lock for 
{{getCollectionLock(coll)}} and is waiting on the lock for 
{{collectionPropsObservers}}. Hence deadlock.

This code is quite complex, and I think it can really be simplified, but that's 
just a gut reaction. I think moving the {{synchronized 
(getCollectionLock(collection))}} block in {{removeCollectionPropsWatcher()}} 
outside of the {{compute()}} call would solve this one deadlock though.

Hopefully we can really simplify this with Curator though.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to