Luke Kot-Zaniewski created SOLR-17711:
-----------------------------------------

             Summary: IndexFetcher Incorrectly Timing Out
                 Key: SOLR-17711
                 URL: https://issues.apache.org/jira/browse/SOLR-17711
             Project: Solr
          Issue Type: Bug
    Affects Versions: 9.8
            Reporter: Luke Kot-Zaniewski


During our rollout of 9.8 we discovered an interesting
behavior indirectly caused by the Http2SolrClient migration
of IndexFetcher:
 
[https://github.com/apache/solr/commit/25194b02caa383feda293490eed6ccbd7c3ecf05#diff-7af383a173bd8e05414b341ab08e9ca715b665077112c64150c4db00811d16a6]
 
The change itself does not appear to be the problem, but
rather the default behavior of Http2SolrClient applying
the *idle* timeout to the overall request time:
 
[https://github.com/apache/solr/blob/2b8f933529fa736fe5fd2a9b0c751bedf352f0c7/solr/solrj/src/java/org/apache/solr/client/solrj/impl/Http2SolrClient.java#L625-L629]
 
Apparently this choice of default has some history:
 
[https://github.com/apache/solr/commit/a80eb84d5659a06a10860ad2470e87d80b19fa5d]
+
in its current form:
[https://github.com/apache/solr/commit/d70af456058174d15a25d3c9b8cc5f7a8899b62b]
 
At any rate, in most cases this goes unnoticed because the
default idle timeout is quite long (120 seconds) but can
cause problems when applied to something like IndexFetcher
which is probably *expected* to have sometimes really long-lived,
healthy connections exceeding the 120s period. An *idle*
timeout being applied to a long-lived, non-idle connection
doesn't seem quite right...
 
We saw this during replication of a 5GB segment which, at
our bandwidth at the time, exceeded the 120 second time window
and caused the Cloud to get stuck in a replication loop.
 
The stacktrace tells the full story, namely in 120 seconds
we were only able to copy 4.5GB (implied 300 Mbps) and were
interrupted by this arguably misapplied idle timeout (on a
clearly non-idle connection!)
 
{code:java}
2025-03-17 20:47:15.144 WARN IndexFetcher [XXXX] ? [XXXX] - Error in fetching 
file: _7rxzv.cfs (downloaded 457179136 of 5529080446 bytes)
java.io.IOException: java.util.concurrent.TimeoutException: Total timeout 
120000 ms elapsed
at 
org.eclipse.jetty.client.util.InputStreamResponseListener$Input.toIOException(InputStreamResponseListener.java:343)
 ~[jetty-client-10.0.22.jar:10.0.22]
at 
org.eclipse.jetty.client.util.InputStreamResponseListener$Input.read(InputStreamResponseListener.java:311)
 ~[jetty-client-10.0.22.jar:10.0.22]
at java.base/java.io.FilterInputStream.read(FilterInputStream.java:133) ~[?:?]
at 
org.apache.solr.common.util.FastInputStream.readWrappedStream(FastInputStream.java:87)
 ~[solr-solrj-9.8.0 d2673aab0d696a2f7330bf3267533525dfad1200 - cloud-user - 
2025-01-29 16:21:49]
at org.apache.solr.common.util.FastInputStream.read(FastInputStream.java:138) 
~[solr-solrj-9.8.0 d2673aab0d696a2f7330bf3267533525dfad1200 - cloud-user - 
2025-01-29 16:21:49]
at 
org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:176) 
~[solr-solrj-9.8.0 d2673aab0d696a2f7330bf3267533525dfad1200 - cloud-user - 
2025-01-29 16:21:49]
at 
org.apache.solr.handler.IndexFetcher$FileFetcher.fetchPackets(IndexFetcher.java:1849)
 ~[solr-core-9.8.0 d2673aab0d696a2f7330bf3267533525dfad1200 - cloud-user - 
2025-01-29 16:21:49]
at 
org.apache.solr.handler.IndexFetcher$FileFetcher.fetch(IndexFetcher.java:1790) 
~[solr-core-9.8.0 d2673aab0d696a2f7330bf3267533525dfad1200 - cloud-user - 
2025-01-29 16:21:49]
at 
org.apache.solr.handler.IndexFetcher$FileFetcher.fetchFile(IndexFetcher.java:1772)
 ~[solr-core-9.8.0 d2673aab0d696a2f7330bf3267533525dfad1200 - cloud-user - 
2025-01-29 16:21:49]
at 
org.apache.solr.handler.IndexFetcher.downloadIndexFiles(IndexFetcher.java:1192) 
~[solr-core-9.8.0 d2673aab0d696a2f7330bf3267533525dfad1200 - cloud-user - 
2025-01-29 16:21:49]
at org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:679) 
~[solr-core-9.8.0 d2673aab0d696a2f7330bf3267533525dfad1200 - cloud-user - 
2025-01-29 16:21:49]
at org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:410) 
~[solr-core-9.8.0 d2673aab0d696a2f7330bf3267533525dfad1200 - cloud-user - 
2025-01-29 16:21:49]
at 
org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:522) 
~[solr-core-9.8.0 d2673aab0d696a2f7330bf3267533525dfad1200 - cloud-user - 
2025-01-29 16:21:49]
at org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:243) 
~[solr-core-9.8.0 d2673aab0d696a2f7330bf3267533525dfad1200 - cloud-user - 
2025-01-29 16:21:49]
at 
org.apache.solr.cloud.RecoveryStrategy.doSyncOrReplicateRecovery(RecoveryStrategy.java:697)
 ~[solr-core-9.8.0 d2673aab0d696a2f7330bf3267533525dfad1200 - cloud-user - 
2025-01-29 16:21:49]
at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:333) 
~[solr-core-9.8.0 d2673aab0d696a2f7330bf3267533525dfad1200 - cloud-user - 
2025-01-29 16:21:49]
at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:309) 
~[solr-core-9.8.0 d2673aab0d696a2f7330bf3267533525dfad1200 - cloud-user - 
2025-01-29 16:21:49]
at 
com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:212)
 ~[metrics-core-4.2.26.jar:4.2.26]
at 
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
 ~[?:?]
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:380)
 ~[solr-solrj-9.8.0 d2673aab0d696a2f7330bf3267533525dfad1200 - cloud-user - 
2025-01-29 16:21:49]
at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
 ~[?:?]
at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
 ~[?:?]
at java.base/java.lang.Thread.run(Thread.java:829) [?:?]
Caused by: java.util.concurrent.TimeoutException: Total timeout 120000 ms 
elapsed
at 
org.eclipse.jetty.client.HttpConnection$RequestTimeouts.onExpired(HttpConnection.java:342)
 ~[jetty-client-10.0.22.jar:10.0.22]
at 
org.eclipse.jetty.client.HttpConnection$RequestTimeouts.onExpired(HttpConnection.java:322)
 ~[jetty-client-10.0.22.jar:10.0.22]
at 
org.eclipse.jetty.io.CyclicTimeouts.onTimeoutExpired(CyclicTimeouts.java:110) 
~[jetty-io-10.0.22.jar:10.0.22]
at 
org.eclipse.jetty.io.CyclicTimeouts$Timeouts.onTimeoutExpired(CyclicTimeouts.java:197)
 ~[jetty-io-10.0.22.jar:10.0.22]
at org.eclipse.jetty.io.CyclicTimeout$Wakeup.run(CyclicTimeout.java:294) 
~[jetty-io-10.0.22.jar:10.0.22]
at 
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
 ~[?:?]
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
at 
java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
 ~[?:?]
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to