dsmiley commented on code in PR #3891: URL: https://github.com/apache/solr/pull/3891#discussion_r2659451629
########## changelog/unreleased/SOLR-18002-add-unresponsive-servers-to-zombie-list.yml: ########## @@ -0,0 +1,8 @@ +# See https://github.com/apache/solr/blob/main/dev-docs/changelog.adoc +title: Add unresponsive servers to zombie list Review Comment: ```suggestion title: CloudSolrClient/LBSolrClient should consider a retry-able request that times out as another condition to internally mark that replica as a "zombie". ``` I wish 'title' was named 'description' or 'summary'. 'title' suggests something shorter. ########## solr/solrj/src/test/org/apache/solr/client/solrj/impl/LB2SolrClientTest.java: ########## @@ -330,4 +356,57 @@ public void close() { } } } + + private class ZombieTestContext implements AutoCloseable { + final ServerSocket blackhole; + final LBSolrClient.Endpoint nonRoutableEndpoint; + final HttpJettySolrClient delegateClient; + final LBAsyncSolrClient lbClient; + + ZombieTestContext() throws Exception { + //create a socket that allows a client to connect but causes them to hang until idleTimeout is triggered + blackhole = new ServerSocket(0); + int blackholePort = blackhole.getLocalPort(); + nonRoutableEndpoint = + new LBSolrClient.Endpoint("http://localhost:" + blackholePort + "/solr"); + + delegateClient = + new HttpJettySolrClient.Builder() + .withConnectionTimeout(1000, TimeUnit.MILLISECONDS) + .withIdleTimeout(100, TimeUnit.MILLISECONDS) Review Comment: I suppose an even smaller number would work too, like 1 millisecond? ########## solr/solrj/src/test/org/apache/solr/client/solrj/impl/LB2SolrClientTest.java: ########## @@ -202,6 +205,29 @@ public void testTwoServers() throws Exception { } } + public void testTimeoutExceptionMarksServerAsZombie() throws Exception { + try (ZombieTestContext ctx = new ZombieTestContext()) { Review Comment: it's not apparent looking at this, that ZombieTestContext will induce a timeout. That seems to be its fundamental purpose and thus is associated with just a couple tests. At a glance, it appears to be a test utility I otherwise have never heard of. A reviewer (me) has to then go look at the code for it, and infer some things to come to that conclusion. If you prefix the name with `Timeout` then it'd help. ########## changelog/unreleased/SOLR-18002-add-unresponsive-servers-to-zombie-list.yml: ########## @@ -0,0 +1,8 @@ +# See https://github.com/apache/solr/blob/main/dev-docs/changelog.adoc +title: Add unresponsive servers to zombie list +type: fixed # added, changed, fixed, deprecated, removed, dependency_update, security, other Review Comment: I view this as an improvement, thus "changed", rather than a bug fix. WDYT? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
