[ https://issues.apache.org/jira/browse/SOLR-16086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17806672#comment-17806672 ]
Andreas Hubold commented on SOLR-16086: --------------------------------------- Could this be the same issue as SOLR-17118? > Issues with TestReplicationHandler.doTestIndexFetchOnLeaderRestart > ------------------------------------------------------------------ > > Key: SOLR-16086 > URL: https://issues.apache.org/jira/browse/SOLR-16086 > Project: Solr > Issue Type: Bug > Components: replication (java), Tests > Affects Versions: 9.0 > Reporter: Houston Putman > Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > Ever since early December 2021 the {{doTestIndexFetchOnLeaderRestart}} test > has been failing around 3% of the time. It looks like this was introduced by > SOLR-15590. When drilling into why the test fails, it looks like the > replication never happens in the follower (no logging whatsoever of the > replication handler or the index fetcher). This indicates that there is > something that is hanging in the first replication call request. The > indexFetcher start the fetching thread at a random interval between 1 ms and > 1000 ms. After the follower is started, the leader is restarted. It generally > (from my observation) takes around 30 ms for this to happen. Meaning that 3% > of the tests will have the first indexFetcher request sent while the leader > is restarting, which is in line with the failure rate we are seeing. > Mike Drob and I could not get the hanging indexFetcher request to replicate > locally, so this is still conjecture, and we are unsure as to how SOLR-15590 > would be affecting it. > Side note: When looking at the history of the test, it looks like the > original purpose of the test is no longer tested for as well. Originally the > last part of the test was to make sure that there was only 1 successful index > replication, that test has now been moved to before the leader is started up > again. This no longer checks that a full replication happens after the leader > starts. So we just need to add that check in at the back of the test. (This > was changed in SOLR-13577) -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org