[ https://issues.apache.org/jira/browse/SOLR-11535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17770574#comment-17770574 ]
Michael Gibney edited comment on SOLR-11535 at 9/29/23 8:29 PM: ---------------------------------------------------------------- This is still a problem! This issue was filed a long time ago, so it's possible the exact mechanism was different at the time of initial reporting. But as of now, basically new {{StateWatcher}} instances are created based on whether a registered {{DocCollectionWatcher}} caused a new entry to be [added to collectionWatches|https://github.com/apache/solr/blob/240ae14962a62192fedaea48d07590dd15ff1891/solr/solrj-zookeeper/src/java/org/apache/solr/common/cloud/ZkStateReader.java#L1755-L1769]. When an entry is [removed from collectionWatches|https://github.com/apache/solr/blob/240ae14962a62192fedaea48d07590dd15ff1891/solr/solrj-zookeeper/src/java/org/apache/solr/common/cloud/ZkStateReader.java#L1987-L1992], it leaves it's associated {{StateWatcher}} in place. The {{StateWatcher}} is then supposed to find that its collection is [no longer represented in collectionWatches|https://github.com/apache/solr/blob/240ae14962a62192fedaea48d07590dd15ff1891/solr/solrj-zookeeper/src/java/org/apache/solr/common/cloud/ZkStateReader.java#L1332-L1336], and exit (not resetting itself). But if another entry has been added to collectionWatches in the meantime, the old StateWatcher will continue resetting itself indefinitely, and a new StateWatcher will have been added for the entry when it was created. The essence of the fix is described [here|https://github.com/apache/solr/pull/1964#discussion_r1341703136]. was (Author: mgibney): This is still a problem! This issue was filed a long time ago, so it's possible the exact mechanism was different at the time of initial reporting. But as of now, basically new {{StateWatcher}} instances are created based on whether a registered {{DocCollectionWatcher}} caused a new entry to be [added to collectionWatches|https://github.com/apache/solr/blob/240ae14962a62192fedaea48d07590dd15ff1891/solr/solrj-zookeeper/src/java/org/apache/solr/common/cloud/ZkStateReader.java#L1755-L1769]. When an entry is [removed from collectionWatches|https://github.com/apache/solr/blob/240ae14962a62192fedaea48d07590dd15ff1891/solr/solrj-zookeeper/src/java/org/apache/solr/common/cloud/ZkStateReader.java#L1987-L1992], it leaves it's associated {{StateWatcher}} in place. The {{StateWatcher}} is then supposed to find that its collection is [no longer represented in collectionWatches|https://github.com/apache/solr/blob/240ae14962a62192fedaea48d07590dd15ff1891/solr/solrj-zookeeper/src/java/org/apache/solr/common/cloud/ZkStateReader.java#L1332-L1336], and not exit (not resetting itself). But if another entry has been added to collectionWatches in the meantime, the old StateWatcher will continue resetting itself indefinitely, and a new StateWatcher will have been added for the entry when it was created. The essence of the fix is described [here|https://github.com/apache/solr/pull/1964#discussion_r1341703136]. > Weird behavior of CollectionStateWatcher > ---------------------------------------- > > Key: SOLR-11535 > URL: https://issues.apache.org/jira/browse/SOLR-11535 > Project: Solr > Issue Type: Bug > Affects Versions: 7.2, 8.0 > Reporter: Andrzej Bialecki > Assignee: Michael Gibney > Priority: Major > Attachments: test.log > > Time Spent: 20m > Remaining Estimate: 0h > > While working on SOLR-11320 I noticed a strange behavior in > {{ActiveReplicaWatcher}}, which is a subclass of {{CollectionStateWatcher}} - > it appears that its {{onStateChanged}} method can be called from multiple > threads with exactly the same {{DocCollection}} state, ie. unchanged between > the calls. > This seems to run contrary to the javadoc, which implies that this method is > called only when the state actually changes, and it also doesn't mention > anything about the need for thread-safety in the method implementation. > I attached the log, which has a lot of additional debugging - but the most > pertinent part being where a Watcher-s hashCode is printed together with the > {{DocCollection}} - notice that these overlapping calls both submit an > instance of {{DocCollection}} with the same zkVersion. > [~dragonsinth], [~romseygeek] - could you please take a look at this? If this > behavior is expected then the javadoc should be updated to state clearly that > multiple calls can be made concurrently, with exactly the same state (which > is kind of a weak guarantee for a method called {{onStateChanged}} ;) ). -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org