HoustonPutman opened a new pull request, #3205:
URL: https://github.com/apache/solr/pull/3205

   `TestCoordinatorRole.testNRTRestart()` has been flaky for a long time, and 
for various reasons. I believe this is the last one.
   
   Basically in the last part, it's supposed to turn off the NRT replica and 
the PULL replica alternating, and keep trying requests that have 
`shards.preference=NRT`, until a PULL replica is forced to be used to serve the 
request.
   
   The issue was that the pull node is started right before this. So if a very 
low (< 300 ms) random value is chosen for `serveTogetherTime`, then the Pull 
replica will fail to recover from the NRT replica leader. Pull replicas do not 
become active unless they recover on startup. So when the NRT replica is 
offline, requests will fail because there are no replicas serving the requested 
shard.
   
   The `getHostCoreName` call can handle up to 500 ms of failures, so if 
`downTime` (another random int) is > 500, or a lower number, because it takes 
time to startup, then it will exceed the number of allowed errors.
   
   The fix here is 2 parts:
   - The only really necessary fix is waiting for the pull replica to come 
online before starting to take down other nodes
   - In order to keep the spirit of the test, I reversed the ordering of the 
jettys that will be brought down, because otherwise the PULL replica is chosen 
immediately and we don't have to do any iterations of this loop. Because of 
this, we need to do the same replica-state-check at the end of the loop, to 
ensure the above failure scenario doesn't happen during our loop either.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to