[ 
https://issues.apache.org/jira/browse/SOLR-17744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris M. Hostetter updated SOLR-17744:
--------------------------------------
    Attachment: SOLR-17744.patch
      Assignee: Chris M. Hostetter
        Status: Open  (was: Open)

I'm attaching a patch the basically re-creates Jetty-12's {{graceful.mod}} but 
using {{{}StatisticsHandler{}}}.

So far in my (miminal) testing this patch does the job i hoped it would:
 * clients with non-distrib in-flight requests to a solr node that is being 
shutdown:
 ** no longer get errors – the requests finish successfully
 ** true for either non-distrib requests, or single-shard collections

 * clients with (multi-shard) distributed in-flight requests to a solr node 
that is being shutdown:
 ** _ALSO_ no longer get connection errors – the requests finish successfully
 ** I do see sometimes see {{org.eclipse.jetty.io.EofException: Closed}} errors 
in the logs, but it seems like it only happens with the sub-shard requests are 
being sent to the same node that's being shutdown?
 *** but the requests still seem to finish successfully which is weird.
 ** No such errors seem to be logged when sub-shard requests are in-flight to 
other nodes (not being shutdown)

 * clients with (multi-shard) distributed in-flight requests to a solr node 
that is _NOT_ being shutdown, but is sending sub-shard requests to a node being 
shutdown:
 ** _Still_ don't get connection errors – there's o reason they ever would
 ** But they _CAN_ still get 500 errors
 *** It looks like internally {{solr.SearchHandler}} doesn't deal well 
w/rejection when a sub-shard request gets a {{java.net.ConnectException: 
Connection refused}}

While there are certainly other improvements we can add to Solr to make our own 
internal code work better on graceful shutdown (including re-thinking how early 
we de-register from live nodes / cluster state) I really think this jetty level 
improv3ement is worth putting in.

(And FWIW: the jetty-12 {{GracefulHandler}} looks like it would be pretty easy 
to clone/backport if folks feel like the "stats" part of {{StatisticsHandler}} 
is too much overhead)

> Solr shutdown does not graceful close Jetty requests/connections
> ----------------------------------------------------------------
>
>                 Key: SOLR-17744
>                 URL: https://issues.apache.org/jira/browse/SOLR-17744
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Chris M. Hostetter
>            Assignee: Chris M. Hostetter
>            Priority: Major
>         Attachments: SOLR-17744.patch
>
>
> Solr does a lot of work internally (via things like SolrCore reference 
> counting) to ensure that we "finish" in-flight requests on orderly shutdown 
> (ie: when the user has issued a "stop" command) – but it does not appear that 
> we are doing anything to ensure that *Jetty* managed resources will also wait 
> for in process requests to finish.
> In particular, Jetty seems to abruptly close any existing & active network 
> connections to clients, even as Solr continues to process those requests and 
> try to write out the responses.
> There are Jetty features to ensure that  shutdown is genuinely "graceful" 
> (refusing new requests while letting existing ones finish) but Solr doesn't 
> appear to use/enable these features:
>  * In Jetty 10 & 11, this is apparently done using the {{StatisticsHandler}} 
> (as a wrapper around the main handler collection i think?)
>  ** 
> [https://github.com/jetty/jetty.project/issues/2076#issuecomment-353578130]
>  ** 
> [https://javadoc.jetty.org/jetty-11/org/eclipse/jetty/server/handler/StatisticsHandler.html]
>  ** 
> [https://jetty.org/docs/jetty/11/programming-guide/server/http.html#handler-use-util-stats-handler]
>  * In Jetty 12+ there is a {{graceful}} module that provides a 
> {{GracefulHandler}} (which seems like a slightly more robust version of what 
> {{StatisticsHandler}} does in jetty-10, but with less statistics tracking 
> overhead)
>  ** 
> [https://jetty.org/docs/jetty/12/operations-guide/start/index.html#stop-graceful]
>  ** 
> [https://jetty.org/docs/jetty/12/operations-guide/modules/standard.html#graceful]
>  
> The net result is that even during planned shutdown (or restart) of Solr 
> nodes, clients can get lots of errors.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to