[ 
https://issues.apache.org/jira/browse/SOLR-17744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17944850#comment-17944850
 ] 

Chris M. Hostetter commented on SOLR-17744:
-------------------------------------------

Reproducing this problem is pretty easy.
 * spin up the {{-e cloud}} example
 * index a few million docs
 ** I used a few randomly generated long fields
 * use some shell scripting so that a handful of concurrent loops issue the 
same expensive/slow query over and over to {{localhost:7574}} (exiting on 
failure)
 ** I used function queries & sorts wrapped around some nested {{scale()}} 
functions
 * run {{./bin/solr stop -p 7574}} while those curl loops are running

Examples of the types of errors you might get from your curl loops...
{noformat}
curl: (52) Empty reply from server
curl: (18) transfer closed with outstanding read data remaining
curl: (7) Failed to connect to localhost port 7574: Connection refused
{noformat}
...that last one, "Failed to connect", is really the only valid error curl 
should report if Solr+Jetty are both genuinely doing a "graceful" shutdown.

Here's an example of what you'll see in the Solr logs...
{noformat}
2025-04-15 00:09:40.129 INFO  (ShutdownMonitor) [c: s: r: x: t:] o.e.j.s.Server 
Stopped Server@64337702{STOPPING}[10.0.20,sto=0]
2025-04-15 00:09:40.135 INFO  (ShutdownMonitor) [c: s: r: x: t:] 
o.e.j.s.AbstractConnector Stopped ServerConnector@470a696f{HTTP/1.1, (http/1.1, 
h2c)}{127.0.0.1:7574}
2025-04-15 00:09:40.161 INFO  (qtp1631119258-39-localhost-58) [c:gettingstarted 
s:shard2 r:core_node5 x:gettingstarted_shard2_replica_n2 t:localhost-58] 
o.a.s.c.S.Request webapp=/solr path=/select 
params={df=_text_&distrib=false&fl=id&fl=score&shards.purpose=16388&start=0&fsv=true&sort=scale(product(s
cale(b_l,-88888,1234567),scale(a_l,-25,99999999)),-12345678,987654321)+asc,+scale(product(scale(c_l,-88888,1234567),scale(d_l,-25,99999999)),-12345678,987654321)+desc&rows=10000&rid=localhost-58&version=2&q={!func}scale(product(scale(e_l,-88888,1234567),scale(f_l,-25,99999999)),-12345678,987654321)&om
itHeader=false&NOW=1744675779392&isShard=true&wt=javabin} hits=500400 status=0 
QTime=767
2025-04-15 00:09:40.232 INFO  (qtp1631119258-39-localhost-58) [c:gettingstarted 
s:shard2 r:core_node5 x:gettingstarted_shard2_replica_n2 t:localhost-58] 
o.a.s.s.HttpSolrCall Unable to write response, client closed connection or we 
are shutting down => org.eclipse.jetty.io.EofException: Closed
        at 
org.eclipse.jetty.server.HttpOutput.checkWritable(HttpOutput.java:756)
org.eclipse.jetty.io.EofException: Closed
        at 
org.eclipse.jetty.server.HttpOutput.checkWritable(HttpOutput.java:756) 
~[jetty-server-10.0.20.jar:10.0.20]
        at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:780) 
~[jetty-server-10.0.20.jar:10.0.20]
        at 
org.apache.solr.servlet.ServletOutputStreamWrapper.write(ServletOutputStreamWrapper.java:157)
 ~[?:?]
        at 
org.apache.solr.common.util.FastOutputStream.flush(FastOutputStream.java:207) 
~[?:?]
        at 
org.apache.solr.common.util.FastOutputStream.flushBuffer(FastOutputStream.java:200)
 ~[?:?]
        at 
org.apache.solr.common.util.JavaBinCodec.marshal(JavaBinCodec.java:170) ~[?:?]
        at 
org.apache.solr.response.BinaryResponseWriter.write(BinaryResponseWriter.java:58)
 ~[?:?]
        at 
org.apache.solr.response.QueryResponseWriterUtil.writeQueryResponse(QueryResponseWriterUtil.java:59)
 ~[?:?]
        at 
org.apache.solr.servlet.HttpSolrCall.writeResponse(HttpSolrCall.java:1031) 
~[?:?]
        at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:621) 
~[?:?]
...
2025-04-15 00:09:40.240 INFO  (ShutdownMonitor) [c: s: r: x: t:] 
o.a.s.c.CoreContainer Shutting down CoreContainer instance=1562912969
2025-04-15 00:09:40.241 INFO  (ShutdownMonitor) [c: s: r: x: t:] 
o.a.s.c.ZkController Remove node as live in 
ZooKeeper:/live_nodes/localhost:7574_solr
2025-04-15 00:09:40.254 INFO  (ShutdownMonitor) [c: s: r: x: t:] 
o.a.s.c.ZkController Publish this node as DOWN...
2025-04-15 00:09:40.254 INFO  (ShutdownMonitor) [c: s: r: x: t:] 
o.a.s.c.ZkController Publish node=localhost:7574_solr as DOWN
...
{noformat}
Note that Jetty {{Stopped ServerConnector}} almost immediately, but Solr is 
still using jetty request threads like {{qtp1631119258-39-localhost-58}} to 
spend time/cpu processing requests, only for the Jetty's {{HttpOutput}} to be 
unable to write the response to the client because the connection has already 
been closed.

(Heck: Solr hasn't even had a chance to update the nodes status in ZK when 
{{Stopped ServerConnector}} happens – so not only are in-flight connections 
being aborted, but we're still advertising this node as available to SolrJ 
clients)

> Solr shutdown does not graceful close Jetty requests/connections
> ----------------------------------------------------------------
>
>                 Key: SOLR-17744
>                 URL: https://issues.apache.org/jira/browse/SOLR-17744
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Chris M. Hostetter
>            Priority: Major
>
> Solr does a lot of work internally (via things like SolrCore reference 
> counting) to ensure that we "finish" in-flight requests on orderly shutdown 
> (ie: when the user has issued a "stop" command) – but it does not appear that 
> we are doing anything to ensure that *Jetty* managed resources will also wait 
> for in process requests to finish.
> In particular, Jetty seems to abruptly close any existing & active network 
> connections to clients, even as Solr continues to process those requests and 
> try to write out the responses.
> There are Jetty features to ensure that  shutdown is genuinely "graceful" 
> (refusing new requests while letting existing ones finish) but Solr doesn't 
> appear to use/enable these features:
>  * In Jetty 10 & 11, this is apparently done using the {{StatisticsHandler}} 
> (as a wrapper around the main handler collection i think?)
>  ** 
> [https://github.com/jetty/jetty.project/issues/2076#issuecomment-353578130]
>  ** 
> [https://javadoc.jetty.org/jetty-11/org/eclipse/jetty/server/handler/StatisticsHandler.html]
>  ** 
> [https://jetty.org/docs/jetty/11/programming-guide/server/http.html#handler-use-util-stats-handler]
>  * In Jetty 12+ there is a {{graceful}} module that provides a 
> {{GracefulHandler}} (which seems like a slightly more robust version of what 
> {{StatisticsHandler}} does in jetty-10, but with less statistics tracking 
> overhead)
>  ** 
> [https://jetty.org/docs/jetty/12/operations-guide/start/index.html#stop-graceful]
>  ** 
> [https://jetty.org/docs/jetty/12/operations-guide/modules/standard.html#graceful]
>  
> The net result is that even during planned shutdown (or restart) of Solr 
> nodes, clients can get lots of errors.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to