[ 
https://issues.apache.org/jira/browse/SOLR-16086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17806672#comment-17806672
 ] 

Andreas Hubold commented on SOLR-16086:
---------------------------------------

Could this be the same issue as SOLR-17118?

> Issues with TestReplicationHandler.doTestIndexFetchOnLeaderRestart
> ------------------------------------------------------------------
>
>                 Key: SOLR-16086
>                 URL: https://issues.apache.org/jira/browse/SOLR-16086
>             Project: Solr
>          Issue Type: Bug
>          Components: replication (java), Tests
>    Affects Versions: 9.0
>            Reporter: Houston Putman
>            Priority: Major
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Ever since early December 2021 the {{doTestIndexFetchOnLeaderRestart}} test 
> has been failing around 3% of the time. It looks like this was introduced by 
> SOLR-15590. When drilling into why the test fails, it looks like the 
> replication never happens in the follower (no logging whatsoever of the 
> replication handler or the index fetcher). This indicates that there is 
> something that is hanging in the first replication call request. The 
> indexFetcher start the fetching thread at a random interval between 1 ms and 
> 1000 ms. After the follower is started, the leader is restarted. It generally 
> (from my observation) takes around 30 ms for this to happen. Meaning that 3% 
> of the tests will have the first indexFetcher request sent while the leader 
> is restarting, which is in line with the failure rate we are seeing.
> Mike Drob and I could not get the hanging indexFetcher request to replicate 
> locally, so this is still conjecture, and we are unsure as to how SOLR-15590 
> would be affecting it.
> Side note: When looking at the history of the test, it looks like the 
> original purpose of the test is no longer tested for as well. Originally the 
> last part of the test was to make sure that there was only 1 successful index 
> replication, that test has now been moved to before the leader is started up 
> again. This no longer checks that a full replication happens after the leader 
> starts. So we just need to add that check in at the back of the test. (This 
> was changed in SOLR-13577)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to