[ https://issues.apache.org/jira/browse/SOLR-15032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17926972#comment-17926972 ]
David Smiley commented on SOLR-15032: ------------------------------------- Looking at TestPullReplicaErrorHandling.testCloseHooksDeletedOnReconnect like in [develocity here|https://develocity.apache.org/scans/tests?search.relativeStartTime=P90D&search.rootProjectNames=solr-root&search.timeZoneId=America%2FNew_York&tests.container=org.apache.solr.cloud.TestPullReplicaErrorHandling&tests.test=testCloseHooksDeletedOnReconnect], it's rather flaky and more flaky lately. Basically every failure is {{expected:<5> but was:<3>}} In the same failing run on my machine today, also saw testCantConnectToPullReplica in this suite fail: {{Underlying core creation failed while creating collection: pull_replica_error_handling_test_close_hooks_deleted_on_reconnec}} > Race condition in TestPullReplicaErrorHandling > ---------------------------------------------- > > Key: SOLR-15032 > URL: https://issues.apache.org/jira/browse/SOLR-15032 > Project: Solr > Issue Type: Test > Components: Tests > Reporter: Mike Drob > Priority: Major > Labels: race-condition > > See discussion at > https://github.com/apache/lucene-solr/pull/2115#discussion_r534445545 > There is a race condition in two tests in TestPullReplicaErrorHandling where > we expire a ZK session, then wait for a node down and a node up. It's > possible that the node recovers before we even start waiting for the first > down. > Better would be to set a watch on the live-node that we're about to expire, > and wait to see the delete before checking for the node to come back up. > cc: [~tflobbe] -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org