[ https://issues.apache.org/jira/browse/SOLR-17744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17944850#comment-17944850 ]
Chris M. Hostetter commented on SOLR-17744: ------------------------------------------- Reproducing this problem is pretty easy. * spin up the {{-e cloud}} example * index a few million docs ** I used a few randomly generated long fields * use some shell scripting so that a handful of concurrent loops issue the same expensive/slow query over and over to {{localhost:7574}} (exiting on failure) ** I used function queries & sorts wrapped around some nested {{scale()}} functions * run {{./bin/solr stop -p 7574}} while those curl loops are running Examples of the types of errors you might get from your curl loops... {noformat} curl: (52) Empty reply from server curl: (18) transfer closed with outstanding read data remaining curl: (7) Failed to connect to localhost port 7574: Connection refused {noformat} ...that last one, "Failed to connect", is really the only valid error curl should report if Solr+Jetty are both genuinely doing a "graceful" shutdown. Here's an example of what you'll see in the Solr logs... {noformat} 2025-04-15 00:09:40.129 INFO (ShutdownMonitor) [c: s: r: x: t:] o.e.j.s.Server Stopped Server@64337702{STOPPING}[10.0.20,sto=0] 2025-04-15 00:09:40.135 INFO (ShutdownMonitor) [c: s: r: x: t:] o.e.j.s.AbstractConnector Stopped ServerConnector@470a696f{HTTP/1.1, (http/1.1, h2c)}{127.0.0.1:7574} 2025-04-15 00:09:40.161 INFO (qtp1631119258-39-localhost-58) [c:gettingstarted s:shard2 r:core_node5 x:gettingstarted_shard2_replica_n2 t:localhost-58] o.a.s.c.S.Request webapp=/solr path=/select params={df=_text_&distrib=false&fl=id&fl=score&shards.purpose=16388&start=0&fsv=true&sort=scale(product(s cale(b_l,-88888,1234567),scale(a_l,-25,99999999)),-12345678,987654321)+asc,+scale(product(scale(c_l,-88888,1234567),scale(d_l,-25,99999999)),-12345678,987654321)+desc&rows=10000&rid=localhost-58&version=2&q={!func}scale(product(scale(e_l,-88888,1234567),scale(f_l,-25,99999999)),-12345678,987654321)&om itHeader=false&NOW=1744675779392&isShard=true&wt=javabin} hits=500400 status=0 QTime=767 2025-04-15 00:09:40.232 INFO (qtp1631119258-39-localhost-58) [c:gettingstarted s:shard2 r:core_node5 x:gettingstarted_shard2_replica_n2 t:localhost-58] o.a.s.s.HttpSolrCall Unable to write response, client closed connection or we are shutting down => org.eclipse.jetty.io.EofException: Closed at org.eclipse.jetty.server.HttpOutput.checkWritable(HttpOutput.java:756) org.eclipse.jetty.io.EofException: Closed at org.eclipse.jetty.server.HttpOutput.checkWritable(HttpOutput.java:756) ~[jetty-server-10.0.20.jar:10.0.20] at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:780) ~[jetty-server-10.0.20.jar:10.0.20] at org.apache.solr.servlet.ServletOutputStreamWrapper.write(ServletOutputStreamWrapper.java:157) ~[?:?] at org.apache.solr.common.util.FastOutputStream.flush(FastOutputStream.java:207) ~[?:?] at org.apache.solr.common.util.FastOutputStream.flushBuffer(FastOutputStream.java:200) ~[?:?] at org.apache.solr.common.util.JavaBinCodec.marshal(JavaBinCodec.java:170) ~[?:?] at org.apache.solr.response.BinaryResponseWriter.write(BinaryResponseWriter.java:58) ~[?:?] at org.apache.solr.response.QueryResponseWriterUtil.writeQueryResponse(QueryResponseWriterUtil.java:59) ~[?:?] at org.apache.solr.servlet.HttpSolrCall.writeResponse(HttpSolrCall.java:1031) ~[?:?] at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:621) ~[?:?] ... 2025-04-15 00:09:40.240 INFO (ShutdownMonitor) [c: s: r: x: t:] o.a.s.c.CoreContainer Shutting down CoreContainer instance=1562912969 2025-04-15 00:09:40.241 INFO (ShutdownMonitor) [c: s: r: x: t:] o.a.s.c.ZkController Remove node as live in ZooKeeper:/live_nodes/localhost:7574_solr 2025-04-15 00:09:40.254 INFO (ShutdownMonitor) [c: s: r: x: t:] o.a.s.c.ZkController Publish this node as DOWN... 2025-04-15 00:09:40.254 INFO (ShutdownMonitor) [c: s: r: x: t:] o.a.s.c.ZkController Publish node=localhost:7574_solr as DOWN ... {noformat} Note that Jetty {{Stopped ServerConnector}} almost immediately, but Solr is still using jetty request threads like {{qtp1631119258-39-localhost-58}} to spend time/cpu processing requests, only for the Jetty's {{HttpOutput}} to be unable to write the response to the client because the connection has already been closed. (Heck: Solr hasn't even had a chance to update the nodes status in ZK when {{Stopped ServerConnector}} happens – so not only are in-flight connections being aborted, but we're still advertising this node as available to SolrJ clients) > Solr shutdown does not graceful close Jetty requests/connections > ---------------------------------------------------------------- > > Key: SOLR-17744 > URL: https://issues.apache.org/jira/browse/SOLR-17744 > Project: Solr > Issue Type: New Feature > Reporter: Chris M. Hostetter > Priority: Major > > Solr does a lot of work internally (via things like SolrCore reference > counting) to ensure that we "finish" in-flight requests on orderly shutdown > (ie: when the user has issued a "stop" command) – but it does not appear that > we are doing anything to ensure that *Jetty* managed resources will also wait > for in process requests to finish. > In particular, Jetty seems to abruptly close any existing & active network > connections to clients, even as Solr continues to process those requests and > try to write out the responses. > There are Jetty features to ensure that shutdown is genuinely "graceful" > (refusing new requests while letting existing ones finish) but Solr doesn't > appear to use/enable these features: > * In Jetty 10 & 11, this is apparently done using the {{StatisticsHandler}} > (as a wrapper around the main handler collection i think?) > ** > [https://github.com/jetty/jetty.project/issues/2076#issuecomment-353578130] > ** > [https://javadoc.jetty.org/jetty-11/org/eclipse/jetty/server/handler/StatisticsHandler.html] > ** > [https://jetty.org/docs/jetty/11/programming-guide/server/http.html#handler-use-util-stats-handler] > * In Jetty 12+ there is a {{graceful}} module that provides a > {{GracefulHandler}} (which seems like a slightly more robust version of what > {{StatisticsHandler}} does in jetty-10, but with less statistics tracking > overhead) > ** > [https://jetty.org/docs/jetty/12/operations-guide/start/index.html#stop-graceful] > ** > [https://jetty.org/docs/jetty/12/operations-guide/modules/standard.html#graceful] > > The net result is that even during planned shutdown (or restart) of Solr > nodes, clients can get lots of errors. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org