[ https://issues.apache.org/jira/browse/SOLR-17744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chris M. Hostetter updated SOLR-17744: -------------------------------------- Attachment: SOLR-17744.patch Assignee: Chris M. Hostetter Status: Open (was: Open) I'm attaching a patch the basically re-creates Jetty-12's {{graceful.mod}} but using {{{}StatisticsHandler{}}}. So far in my (miminal) testing this patch does the job i hoped it would: * clients with non-distrib in-flight requests to a solr node that is being shutdown: ** no longer get errors – the requests finish successfully ** true for either non-distrib requests, or single-shard collections * clients with (multi-shard) distributed in-flight requests to a solr node that is being shutdown: ** _ALSO_ no longer get connection errors – the requests finish successfully ** I do see sometimes see {{org.eclipse.jetty.io.EofException: Closed}} errors in the logs, but it seems like it only happens with the sub-shard requests are being sent to the same node that's being shutdown? *** but the requests still seem to finish successfully which is weird. ** No such errors seem to be logged when sub-shard requests are in-flight to other nodes (not being shutdown) * clients with (multi-shard) distributed in-flight requests to a solr node that is _NOT_ being shutdown, but is sending sub-shard requests to a node being shutdown: ** _Still_ don't get connection errors – there's o reason they ever would ** But they _CAN_ still get 500 errors *** It looks like internally {{solr.SearchHandler}} doesn't deal well w/rejection when a sub-shard request gets a {{java.net.ConnectException: Connection refused}} While there are certainly other improvements we can add to Solr to make our own internal code work better on graceful shutdown (including re-thinking how early we de-register from live nodes / cluster state) I really think this jetty level improv3ement is worth putting in. (And FWIW: the jetty-12 {{GracefulHandler}} looks like it would be pretty easy to clone/backport if folks feel like the "stats" part of {{StatisticsHandler}} is too much overhead) > Solr shutdown does not graceful close Jetty requests/connections > ---------------------------------------------------------------- > > Key: SOLR-17744 > URL: https://issues.apache.org/jira/browse/SOLR-17744 > Project: Solr > Issue Type: New Feature > Reporter: Chris M. Hostetter > Assignee: Chris M. Hostetter > Priority: Major > Attachments: SOLR-17744.patch > > > Solr does a lot of work internally (via things like SolrCore reference > counting) to ensure that we "finish" in-flight requests on orderly shutdown > (ie: when the user has issued a "stop" command) – but it does not appear that > we are doing anything to ensure that *Jetty* managed resources will also wait > for in process requests to finish. > In particular, Jetty seems to abruptly close any existing & active network > connections to clients, even as Solr continues to process those requests and > try to write out the responses. > There are Jetty features to ensure that shutdown is genuinely "graceful" > (refusing new requests while letting existing ones finish) but Solr doesn't > appear to use/enable these features: > * In Jetty 10 & 11, this is apparently done using the {{StatisticsHandler}} > (as a wrapper around the main handler collection i think?) > ** > [https://github.com/jetty/jetty.project/issues/2076#issuecomment-353578130] > ** > [https://javadoc.jetty.org/jetty-11/org/eclipse/jetty/server/handler/StatisticsHandler.html] > ** > [https://jetty.org/docs/jetty/11/programming-guide/server/http.html#handler-use-util-stats-handler] > * In Jetty 12+ there is a {{graceful}} module that provides a > {{GracefulHandler}} (which seems like a slightly more robust version of what > {{StatisticsHandler}} does in jetty-10, but with less statistics tracking > overhead) > ** > [https://jetty.org/docs/jetty/12/operations-guide/start/index.html#stop-graceful] > ** > [https://jetty.org/docs/jetty/12/operations-guide/modules/standard.html#graceful] > > The net result is that even during planned shutdown (or restart) of Solr > nodes, clients can get lots of errors. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org