Yep - I edited the two instances to remove those differences and re-ran a
fairly tortuous testing cycle - 100 requests/sec.

Interestingly, 8.9.0 outperformed 8.3.1 significantly at this point and
neither hit the threads limit or failed to create new threads. But both
scaled up to well above 2k threads and even after stopping the test, the
threads remained in a timed wait condition. So I'm still thinking there's a
problem with threads not getting terminated somewhere. Test output below if
it's of use:

8.3.1
upload: ./831_tests_results.bin to
s3://adzuna-files-stage/test_framework/solr/tests/2021-10-13:12:20:25
Requests      [total, rate, throughput]  6000, 100.02, 25.05
Duration      [total, attack, wait]      1m29.97089157s, 59.990075461s,
29.980816109s
Latencies     [mean, 50, 95, 99, max]    11.011900024s, 5.175300857s,
30.00091013s, 30.001077592s, 30.014343222s
Bytes In      [total, mean]              42385846, 7064.31
Bytes Out     [total, mean]              0, 0.00
Success       [ratio]                    37.57%
Status Codes  [code:count]               0:3741  200:2254  500:5
Error Set:
500 Server Error
Wed Oct 13 12:20:27 UTC 2021
upload: ./831_tests_results.bin to
s3://adzuna-files-stage/test_framework/solr/tests/2021-10-13:12:21:57
Requests      [total, rate, throughput]  6000, 100.02, 10.98
Duration      [total, attack, wait]      1m29.931457331s, 59.990859187s,
29.940598144s
Latencies     [mean, 50, 95, 99, max]    7.74211412s, 57.37µs,
30.000885208s, 30.00105703s, 30.011268185s
Bytes In      [total, mean]              21287225, 3547.87
Bytes Out     [total, mean]              0, 0.00
Success       [ratio]                    16.45%
Status Codes  [code:count]               0:5007  200:987  500:6
Error Set:
500 Server Error
Wed Oct 13 12:21:58 UTC 2021
upload: ./831_tests_results.bin to
s3://adzuna-files-stage/test_framework/solr/tests/2021-10-13:12:23:27
Requests      [total, rate, throughput]  6000, 100.02, 11.68
Duration      [total, attack, wait]      1m29.58141878s, 59.990945585s,
29.590473195s
Latencies     [mean, 50, 95, 99, max]    9.05339281s, 58.742µs,
30.000880433s, 30.001052307s, 30.004624884s
Bytes In      [total, mean]              17434533, 2905.76
Bytes Out     [total, mean]              0, 0.00
Success       [ratio]                    17.43%
Status Codes  [code:count]               0:4874  200:1046  500:80
Error Set:
500 Server Error
context deadline exceeded (Client.Timeout or context cancellation while
reading body)
Wed Oct 13 12:23:29 UTC 2021
upload: ./831_tests_results.bin to
s3://adzuna-files-stage/test_framework/solr/tests/2021-10-13:12:24:58
Requests      [total, rate, throughput]  6000, 100.02, 1.70
Duration      [total, attack, wait]      1m29.541198374s, 59.990833778s,
29.550364596s
Latencies     [mean, 50, 95, 99, max]    8.155874514s, 51.073µs,
30.000892751s, 30.001051441s, 30.009856373s
Bytes In      [total, mean]              2211649, 368.61
Bytes Out     [total, mean]              0, 0.00
Success       [ratio]                    2.53%
Status Codes  [code:count]               0:5810  200:152  500:38
Error Set:
500 Server Error
Wed Oct 13 12:24:59 UTC 2021
upload: ./831_tests_results.bin to
s3://adzuna-files-stage/test_framework/solr/tests/2021-10-13:12:26:29
Requests      [total, rate, throughput]  6000, 100.02, 0.73
Duration      [total, attack, wait]      1m29.451307081s, 59.990601937s,
29.460705144s
Latencies     [mean, 50, 95, 99, max]    7.257427504s, 50.866µs,
30.000885386s, 30.001023192s, 30.009992128s
Bytes In      [total, mean]              837808, 139.63
Bytes Out     [total, mean]              0, 0.00
Success       [ratio]                    1.08%
Status Codes  [code:count]               0:5907  200:65  500:28
Error Set:
500 Server Error
context deadline exceeded (Client.Timeout or context cancellation while
reading body)
Wed Oct 13 12:26:30 UTC 2021
upload: ./831_tests_results.bin to
s3://adzuna-files-stage/test_framework/solr/tests/2021-10-13:12:28:00
Requests      [total, rate, throughput]  6000, 100.02, 1.65
Duration      [total, attack, wait]      1m29.760671622s, 59.990482467s,
29.770189155s
Latencies     [mean, 50, 95, 99, max]    8.288506559s, 50.424µs,
30.000894904s, 30.00104114s, 30.016506845s
Bytes In      [total, mean]              1973103, 328.85
Bytes Out     [total, mean]              0, 0.00
Success       [ratio]                    2.47%
Status Codes  [code:count]               0:5821  200:148  500:31
Error Set:
500 Server Error


8.9.0
upload: ./890_tests_results.bin to
s3://adzuna-files-stage/test_framework/solr/tests/2021-10-13:12:20:27
Requests      [total, rate, throughput]  6000, 100.02, 28.50
Duration      [total, attack, wait]      1m29.930332927s, 59.990809535s,
29.939523392s
Latencies     [mean, 50, 95, 99, max]    10.958871049s, 5.184437078s,
30.000885573s, 30.001088926s, 30.00946393s
Bytes In      [total, mean]              51095634, 8515.94
Bytes Out     [total, mean]              0, 0.00
Success       [ratio]                    42.72%
Status Codes  [code:count]               0:3432  200:2563  500:5
Error Set:
500 Server Error
Wed Oct 13 12:20:28 UTC 2021
upload: ./890_tests_results.bin to
s3://adzuna-files-stage/test_framework/solr/tests/2021-10-13:12:21:58
Requests      [total, rate, throughput]  6000, 100.02, 38.15
Duration      [total, attack, wait]      1m29.951119527s, 59.990310608s,
29.960808919s
Latencies     [mean, 50, 95, 99, max]    10.089597212s, 4.856237339s,
30.000863563s, 30.001063417s, 30.013413816s
Bytes In      [total, mean]              64455304, 10742.55
Bytes Out     [total, mean]              0, 0.00
Success       [ratio]                    57.20%
Status Codes  [code:count]               0:2565  200:3432  500:3
Error Set:
500 Server Error
Wed Oct 13 12:22:00 UTC 2021
upload: ./890_tests_results.bin to
s3://adzuna-files-stage/test_framework/solr/tests/2021-10-13:12:23:30
Requests      [total, rate, throughput]  6000, 100.02, 31.12
Duration      [total, attack, wait]      1m29.971513653s, 59.990882146s,
29.980631507s
Latencies     [mean, 50, 95, 99, max]    7.982882635s, 4.086936886s,
30.000611417s, 30.001022921s, 30.00949772s
Bytes In      [total, mean]              49838902, 8306.48
Bytes Out     [total, mean]              0, 0.00
Success       [ratio]                    46.67%
Status Codes  [code:count]               0:3011  200:2800  500:4  503:185
Error Set:
500 Server Error
unexpected EOF
503 Service Unavailable
Wed Oct 13 12:23:31 UTC 2021
upload: ./890_tests_results.bin to
s3://adzuna-files-stage/test_framework/solr/tests/2021-10-13:12:25:01
Requests      [total, rate, throughput]  6000, 100.02, 30.21
Duration      [total, attack, wait]      1m29.871026977s, 59.990066513s,
29.880960464s
Latencies     [mean, 50, 95, 99, max]    9.187948603s, 5.166973696s,
30.000723538s, 30.001043637s, 30.009522646s
Bytes In      [total, mean]              52508950, 8751.49
Bytes Out     [total, mean]              0, 0.00
Success       [ratio]                    45.25%
Status Codes  [code:count]               0:3118  200:2715  500:4  503:163
Error Set:
500 Server Error
unexpected EOF
503 Service Unavailable
Wed Oct 13 12:25:02 UTC 2021
upload: ./890_tests_results.bin to
s3://adzuna-files-stage/test_framework/solr/tests/2021-10-13:12:26:33
Requests      [total, rate, throughput]  6000, 100.02, 28.33
Duration      [total, attack, wait]      1m29.931899376s, 59.990583341s,
29.941316035s
Latencies     [mean, 50, 95, 99, max]    10.643567102s, 4.009658048s,
30.000888301s, 30.00107338s, 30.010319307s
Bytes In      [total, mean]              51869384, 8644.90
Bytes Out     [total, mean]              0, 0.00
Success       [ratio]                    42.47%
Status Codes  [code:count]               0:3448  200:2548  500:4
Error Set:
500 Server Error
context deadline exceeded (Client.Timeout or context cancellation while
reading body)
Wed Oct 13 12:26:34 UTC 2021
upload: ./890_tests_results.bin to
s3://adzuna-files-stage/test_framework/solr/tests/2021-10-13:12:28:03
Requests      [total, rate, throughput]  6000, 100.02, 40.46
Duration      [total, attack, wait]      1m29.540373624s, 59.990568262s,
29.549805362s
Latencies     [mean, 50, 95, 99, max]    9.938485443s, 5.811191827s,
30.000761953s, 30.001041833s, 30.014534484s
Bytes In      [total, mean]              64483027, 10747.17
Bytes Out     [total, mean]              0, 0.00
Success       [ratio]                    60.38%
Status Codes  [code:count]               0:2372  200:3623  500:5
Error Set:
500 Server Error



On Wed, 13 Oct 2021 at 13:06, Deepak Goel <deic...@gmail.com> wrote:

> Hello
>
> I can as of now see two changes:
>
> 1.  -Xmx
>
> 2. +ExplicitGCInvokesConcurrent
>
> Deepak
> "The greatness of a nation can be judged by the way its animals are treated
> - Mahatma Gandhi"
>
> +91 73500 12833
> deic...@gmail.com
>
> Facebook: https://www.facebook.com/deicool
> LinkedIn: www.linkedin.com/in/deicool
>
> "Plant a Tree, Go Green"
>
> Make In India : http://www.makeinindia.com/home
>
>
> On Wed, Oct 13, 2021 at 5:09 PM Dominic Humphries
> <domi...@adzuna.com.invalid> wrote:
>
> > CLI invocation for 8.3.1 is
> > java -server -Xmx15826m -XX:+UseG1GC -XX:+PerfDisableSharedMem
> > -XX:+ParallelRefProcEnabled -XX:MaxGCPauseMillis=250 -XX:+UseLargePages
> > -XX:+AlwaysPreTouch
> >
> >
> -Xlog:gc*:file=/srv/solr/logs/solr_gc.log:time,uptime:filecount=9,filesize=20M
> > -Dcom.sun.management.jmxremote
> > -Dcom.sun.management.jmxremote.local.only=false
> > -Dcom.sun.management.jmxremote.ssl=false
> > -Dcom.sun.management.jmxremote.authenticate=false
> > -Dcom.sun.management.jmxremote.port=18983
> > -Dcom.sun.management.jmxremote.rmi.port=18983
> -Dsolr.log.dir=/srv/solr/logs
> > -Djetty.port=8983 -DSTOP.PORT=7983 -DSTOP.KEY=solrrocks
> -Duser.timezone=UTC
> > -Djetty.home=/usr/local/solr/server -Dsolr.solr.home=/srv/solr/data
> > -Dsolr.data.home= -Dsolr.install.dir=/usr/local/solr
> >
> -Dsolr.default.confdir=/usr/local/solr/server/solr/configsets/_default/conf
> > -Dlog4j.configurationFile=file:/srv/solr/log4j2.xml
> > -Dsolr.disable.shardsWhitelist=true -Xss256k -Dsolr.jetty.https.port=8983
> > -jar start.jar --module=http
> > I believe the key items are:
> > -XX:+AlwaysPreTouch
> > -XX:+ParallelRefProcEnabled
> > -XX:+PerfDisableSharedMem
> > -XX:+UseG1GC
> > -XX:+UseLargePages
> > -XX:MaxGCPauseMillis=250
> >
> >
> -Xlog:gc*:file=/srv/solr/logs/solr_gc.log:time,uptime:filecount=9,filesize=20M
> > -Xmx15826m
> > -Xss256k
> >
> > And for 8.9.0 is
> > java -server -Xmx7913m -XX:+UseG1GC -XX:+PerfDisableSharedMem
> > -XX:+ParallelRefProcEnabled -XX:MaxGCPauseMillis=250 -XX:+UseLargePages
> > -XX:+AlwaysPreTouch -XX:+ExplicitGCInvokesConcurrent
> >
> >
> -Xlog:gc*:file=/srv/solr/logs/solr_gc.log:time,uptime:filecount=9,filesize=20M
> > -Dsolr.jetty.inetaccess.includes= -Dsolr.jetty.inetaccess.excludes=
> > -Dcom.sun.management.jmxremote
> > -Dcom.sun.management.jmxremote.local.only=false
> > -Dcom.sun.management.jmxremote.ssl=false
> > -Dcom.sun.management.jmxremote.authenticate=false
> > -Dcom.sun.management.jmxremote.port=18983
> > -Dcom.sun.management.jmxremote.rmi.port=18983
> -Dsolr.log.dir=/srv/solr/logs
> > -Djetty.port=8983 -DSTOP.PORT=7983 -DSTOP.KEY=solrrocks
> -Duser.timezone=UTC
> > -XX:-OmitStackTraceInFastThrow
> > -XX:OnOutOfMemoryError=/usr/local/solr/bin/oom_solr.sh 8983
> /srv/solr/logs
> > -Djetty.home=/usr/local/solr/server -Dsolr.solr.home=/srv/solr/data
> > -Dsolr.data.home= -Dsolr.install.dir=/usr/local/solr
> >
> -Dsolr.default.confdir=/usr/local/solr/server/solr/configsets/_default/conf
> > -Dlog4j.configurationFile=/srv/solr/log4j2.xml
> > -Dsolr.disable.shardsWhitelist=true -Xss256k -jar start.jar --module=http
> > Key:
> > -XX:+AlwaysPreTouch
> > -XX:+ExplicitGCInvokesConcurrent
> > -XX:+ParallelRefProcEnabled
> > -XX:+PerfDisableSharedMem
> > -XX:+UseG1GC
> > -XX:+UseLargePages
> > -XX:-OmitStackTraceInFastThrow
> > -XX:MaxGCPauseMillis=250
> > -XX:OnOutOfMemoryError=/usr/local/solr/bin/oom_solr.sh 8983
> /srv/solr/logs
> >
> >
> -Xlog:gc*:file=/srv/solr/logs/solr_gc.log:time,uptime:filecount=9,filesize=20M
> > -Xmx7913m
> > -Xss256k
> >
> > Xmx values are based on the instance RAM, currently they're running on
> two
> > different instance types but we see the same behaviour when they're on
> > identical types too.
> >
> > Many thanks
> >
> > Dominic
> >
> > On Wed, 13 Oct 2021 at 12:07, Deepak Goel <deic...@gmail.com> wrote:
> >
> > > Hello
> > >
> > > Can you please tell us the JVM Heap Setting for both the versions:
> 8.3.1,
> > > 8.9.0?
> > >
> > > I will also have to look into the following code:
> > FileFloatSource.java:210.
> > > (will do it tonite-IST and update)
> > >
> > > Deepak
> > > "The greatness of a nation can be judged by the way its animals are
> > treated
> > > - Mahatma Gandhi"
> > >
> > > +91 73500 12833
> > > deic...@gmail.com
> > >
> > > Facebook: https://www.facebook.com/deicool
> > > LinkedIn: www.linkedin.com/in/deicool
> > >
> > > "Plant a Tree, Go Green"
> > >
> > > Make In India : http://www.makeinindia.com/home
> > >
> > >
> > > On Wed, Oct 13, 2021 at 4:06 PM Dominic Humphries
> > > <domi...@adzuna.com.invalid> wrote:
> > >
> > > > Oh, that's very helpful to know about, ty
> > > >
> > > > The overwhelming majority appear to be threads in TIMED_WAITING, all
> > > > waiting on the same
> > > > thing:
> > >
> >
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@3b315cbb
> > > >
> > > > I've attached a screenshot which includes the stack trace. Stopping
> all
> > > > queries to the instance and waiting didn't result in any noticeable
> > > > decrease in the number of threads so it looks like despite being
> timed,
> > > > they're simply not getting terminated.
> > > >
> > > > Restarting the service takes me back down to just 53 threads;
> > re-running
> > > a
> > > > test results in many new threads immediately coming into being, this
> > time
> > > > with a higher proportion of threads BLOCKED on
> > > >
> > >
> >
> org.apache.solr.search.function.FileFloatSource$CreationPlaceholder@37b782de
> > > > - See second screenshot. The stack trace for those is too big for one
> > > > screen so here's the output:
> > > >
> > > > qtp178604517-861 (861)
> > > >
> > > >
> > > >
> > >
> >
> org.apache.solr.search.function.FileFloatSource$CreationPlaceholder@37b782de
> > > >
> > > >    -
> > > >
> > >
> >
> org.apache.solr.search.function.FileFloatSource$Cache.get(FileFloatSource.java:210)
> > > >    -
> > > >
> > >
> >
> org.apache.solr.search.function.FileFloatSource.getCachedFloats(FileFloatSource.java:158)
> > > >    -
> > > >
> > >
> >
> org.apache.solr.search.function.FileFloatSource.getValues(FileFloatSource.java:97)
> > > >    -
> > > >
> > >
> >
> org.apache.lucene.queries.function.ValueSource$WrappedDoubleValuesSource.getValues(ValueSource.java:203)
> > > >    -
> > > >
> > >
> >
> org.apache.lucene.queries.function.FunctionScoreQuery$MultiplicativeBoostValuesSource.getValues(FunctionScoreQuery.java:261)
> > > >    -
> > > >
> > >
> >
> org.apache.lucene.queries.function.FunctionScoreQuery$FunctionScoreWeight.scorer(FunctionScoreQuery.java:224)
> > > >    - org.apache.lucene.search.Weight.scorerSupplier(Weight.java:148)
> > > >    -
> > > >
> > >
> >
> org.apache.lucene.search.BooleanWeight.scorerSupplier(BooleanWeight.java:379)
> > > >    -
> > > org.apache.lucene.search.BooleanWeight.scorer(BooleanWeight.java:344)
> > > >    - org.apache.lucene.search.Weight.bulkScorer(Weight.java:182)
> > > >    -
> > > >
> > >
> org.apache.lucene.search.BooleanWeight.bulkScorer(BooleanWeight.java:338)
> > > >    -
> > > org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:656)
> > > >    -
> > > org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:443)
> > > >    -
> > > >
> > >
> >
> org.apache.solr.search.SolrIndexSearcher.buildAndRunCollectorChain(SolrIndexSearcher.java:211)
> > > >    -
> > > >
> > >
> >
> org.apache.solr.search.SolrIndexSearcher.getDocListAndSetNC(SolrIndexSearcher.java:1705)
> > > >    -
> > > >
> > >
> >
> org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1408)
> > > >    -
> > > >
> > >
> >
> org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:596)
> > > >    -
> > > >
> > >
> >
> org.apache.solr.handler.component.QueryComponent.doProcessUngroupedSearch(QueryComponent.java:1500)
> > > >    -
> > > >
> > >
> >
> org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:390)
> > > >    -
> > > >
> > >
> >
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:369)
> > > >    -
> > > >
> > >
> >
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:216)
> > > >    - org.apache.solr.core.SolrCore.execute(SolrCore.java:2637)
> > > >    -
> > org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:794)
> > > >    - org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:567)
> > > >    -
> > > >
> > >
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:427)
> > > >    -
> > > >
> > >
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:357)
> > > >    -
> > > >
> > org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:201)
> > > >    -
> > > >
> > >
> >
> org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1601)
> > > >    -
> > > >
> > >
> >
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:548)
> > > >    -
> > > >
> > >
> >
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
> > > >    -
> > > >
> > >
> >
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:602)
> > > >    -
> > > >
> > >
> >
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
> > > >    -
> > > >
> > >
> >
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)
> > > >    -
> > > >
> > >
> >
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1624)
> > > >    -
> > > >
> > >
> >
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
> > > >    -
> > > >
> > >
> >
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1435)
> > > >    -
> > > >
> > >
> >
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
> > > >    -
> > > >
> > >
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:501)
> > > >    -
> > > >
> > >
> >
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1594)
> > > >    -
> > > >
> > >
> >
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
> > > >    -
> > > >
> > >
> >
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1350)
> > > >    -
> > > >
> > >
> >
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
> > > >    -
> > > >
> > >
> >
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:191)
> > > >    -
> > > >
> > >
> >
> org.eclipse.jetty.server.handler.InetAccessHandler.handle(InetAccessHandler.java:177)
> > > >    -
> > > >
> > >
> >
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:146)
> > > >    -
> > > >
> > >
> >
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
> > > >    -
> > > >
> > >
> >
> org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:322)
> > > >    -
> > > >
> > >
> >
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
> > > >    - org.eclipse.jetty.server.Server.handle(Server.java:516)
> > > >    -
> > > >
> > >
> >
> org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:388)
> > > >    -
> > >
> >
> org.eclipse.jetty.server.HttpChannel$$Lambda$556/0x000000080067a440.dispatch(Unknown
> > > >    Source)
> > > >    -
> > org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:633)
> > > >    -
> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:380)
> > > >    -
> > > >
> > >
> >
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277)
> > > >    -
> > > >    org.eclipse.jetty.io
> > > .AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
> > > >    - org.eclipse.jetty.io
> .FillInterest.fillable(FillInterest.java:105)
> > > >    - org.eclipse.jetty.io
> > > .ChannelEndPoint$1.run(ChannelEndPoint.java:104)
> > > >    -
> > > >
> > >
> >
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:336)
> > > >    -
> > > >
> > >
> >
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:313)
> > > >    -
> > > >
> > >
> >
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171)
> > > >    -
> > > >
> > >
> >
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:129)
> > > >    -
> > > >
> > >
> >
> org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:383)
> > > >    -
> > > >
> > >
> >
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:882)
> > > >    -
> > > >
> > >
> >
> org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1036)
> > > >    - java.base@11.0.5/java.lang.Thread.run(Thread.java:834)
> > > >
> > > > [image: image.png]
> > > > [image: image.png]
> > > >
> > > > On Wed, 13 Oct 2021 at 00:03, Joel Bernstein <joels...@gmail.com>
> > wrote:
> > > >
> > > >> There is a thread dump on the Solr admin. You can use that to
> > determine
> > > >> what all those threads are doing and where they are getting stuck.
> You
> > > can
> > > >> post parts of the thread dump back to this email thread as well.
> > > >>
> > > >>
> > > >>
> > > >> Joel Bernstein
> > > >> http://joelsolr.blogspot.com/
> > > >>
> > > >>
> > > >> On Tue, Oct 12, 2021 at 11:15 AM Dominic Humphries
> > > >> <domi...@adzuna.com.invalid> wrote:
> > > >>
> > > >> > We run 8.3.1 in prod without any problems, but we're having issues
> > > with
> > > >> > trying to upgrade.
> > > >> >
> > > >> > I've created an 8.9.0 leader & follower, imported our live data
> into
> > > it,
> > > >> > and am testing it via replaying requests made to prod. We're
> seeing
> > a
> > > >> big
> > > >> > problem where fairly moderate request rates are causing the
> instance
> > > to
> > > >> > become so slow it fails healthcheck. The logs showed a lot of
> errors
> > > >> around
> > > >> > creating threads:
> > > >> >
> > > >> > solr[4507]: [124136.511s][warning][os,thread] Failed to start
> > thread -
> > > >> > pthread_create failed (EAGAIN) for attributes: stacksize: 256k,
> > > >> guardsize:
> > > >> > 0k, detached.
> > > >> >
> > > >> > WARN  (qtp178604517-3891) [   ] o.e.j.i.ManagedSelector  =>
> > > >> > java.lang.OutOfMemoryError: unable to create native thread:
> possibly
> > > >> out of
> > > >> > memory or process/resource limits reached
> > > >> >
> > > >> > So I monitored thread count for the process whilst running the
> test
> > > >> suite
> > > >> > and saw a persistent pattern: Threads increased until maxed out,
> the
> > > >> logs
> > > >> > flooded with errors as it tried to create still more threads, and
> > the
> > > >> > instance slowed down until terminated as unhealthy.
> > > >> >
> > > >> > The DefaultTasksMax is set to 4915, I've tried raising and
> lowering
> > it
> > > >> but
> > > >> > regardless of value the result is the same: it gets maxed and
> > > everything
> > > >> > slows down.
> > > >> >
> > > >> > Is there anything I can do to stop solr spinning up so many
> threads
> > it
> > > >> > ceases to function? There have been a few test passes where it
> > > >> > spontaneously dropped threadcount from thousands to hundreds and
> > > stayed
> > > >> up
> > > >> > longer, but there seems no pattern to when this happens. Running
> the
> > > >> same
> > > >> > tests on 8.3.1 results in a much slower increase in threads and it
> > > never
> > > >> > quite maxes them so things continue to function.
> > > >> >
> > > >> > See below for the thread count and healthcheck times seen on a
> > (fairly
> > > >> > harsh) test run of 100 requests/sec
> > > >> >
> > > >> > Thanks
> > > >> >
> > > >> > Dominic
> > > >> >
> > > >> >
> > > >> > Threadcount:
> > > >> >
> > > >> > ubuntu@ip-10-40-22-166:~$ while [ 1 ]; do date; ps -eLF | grep
> > > >> 'start.jar'
> > > >> > | wc -l; sleep 10s; done
> > > >> > Tue Oct 12 14:27:33 UTC 2021
> > > >> > 52
> > > >> > Tue Oct 12 14:27:43 UTC 2021
> > > >> > 52
> > > >> > Tue Oct 12 14:27:54 UTC 2021
> > > >> > 52
> > > >> > Tue Oct 12 14:28:04 UTC 2021
> > > >> > 52
> > > >> > Tue Oct 12 14:28:14 UTC 2021
> > > >> > 569
> > > >> > Tue Oct 12 14:28:24 UTC 2021
> > > >> > 899
> > > >> > Tue Oct 12 14:28:34 UTC 2021
> > > >> > 1198
> > > >> > Tue Oct 12 14:28:44 UTC 2021
> > > >> > 1589
> > > >> > Tue Oct 12 14:28:54 UTC 2021
> > > >> > 2016
> > > >> > Tue Oct 12 14:29:05 UTC 2021
> > > >> > 2451
> > > >> > Tue Oct 12 14:29:15 UTC 2021
> > > >> > 2851
> > > >> > Tue Oct 12 14:29:26 UTC 2021
> > > >> > 2934
> > > >> > Tue Oct 12 14:29:36 UTC 2021
> > > >> > 3249
> > > >> > Tue Oct 12 14:29:46 UTC 2021
> > > >> > 3501
> > > >> > Tue Oct 12 14:29:57 UTC 2021
> > > >> > 3734
> > > >> > Tue Oct 12 14:30:07 UTC 2021
> > > >> > 4128
> > > >> > Tue Oct 12 14:30:18 UTC 2021
> > > >> > 4374
> > > >> > Tue Oct 12 14:30:29 UTC 2021
> > > >> > 4637
> > > >> > Tue Oct 12 14:30:39 UTC 2021
> > > >> > 4693
> > > >> > Tue Oct 12 14:30:50 UTC 2021
> > > >> > 4807
> > > >> > Tue Oct 12 14:31:01 UTC 2021
> > > >> > 4916
> > > >> > Tue Oct 12 14:31:11 UTC 2021
> > > >> > 4916
> > > >> > Tue Oct 12 14:31:22 UTC 2021
> > > >> > Connection to 10.40.22.166 closed by remote host.
> > > >> >
> > > >> >
> > > >> > Healthcheck:
> > > >> >
> > > >> > ubuntu@ip-10-40-22-166:~$ while [ 1 ]; do date; curl -v
> > > >> > localhost:8983/solr/ 2>&1 | grep HTTP; date; echo '----'; sleep
> > > >> > 10s; done
> > > >> > Tue Oct 12 14:27:34 UTC 2021
> > > >> > > GET /solr/ HTTP/1.1
> > > >> > < HTTP/1.1 200 OK
> > > >> > Tue Oct 12 14:27:34 UTC 2021
> > > >> > ----
> > > >> > Tue Oct 12 14:27:44 UTC 2021
> > > >> > > GET /solr/ HTTP/1.1
> > > >> > < HTTP/1.1 200 OK
> > > >> > Tue Oct 12 14:27:44 UTC 2021
> > > >> > ----
> > > >> > Tue Oct 12 14:27:54 UTC 2021
> > > >> > > GET /solr/ HTTP/1.1
> > > >> > < HTTP/1.1 200 OK
> > > >> > Tue Oct 12 14:27:54 UTC 2021
> > > >> > ----
> > > >> > Tue Oct 12 14:28:04 UTC 2021
> > > >> > > GET /solr/ HTTP/1.1
> > > >> > < HTTP/1.1 200 OK
> > > >> > Tue Oct 12 14:28:04 UTC 2021
> > > >> > ----
> > > >> > Tue Oct 12 14:28:14 UTC 2021
> > > >> > > GET /solr/ HTTP/1.1
> > > >> >   0     0    0     0    0     0      0      0 --:--:--  0:00:02
> > > --:--:--
> > > >> >   0< HTTP/1.1 200 OK
> > > >> > Tue Oct 12 14:28:16 UTC 2021
> > > >> > ----
> > > >> > Tue Oct 12 14:28:26 UTC 2021
> > > >> > > GET /solr/ HTTP/1.1
> > > >> >   0     0    0     0    0     0      0      0 --:--:--  0:00:12
> > > --:--:--
> > > >> >   0< HTTP/1.1 200 OK
> > > >> > Tue Oct 12 14:28:39 UTC 2021
> > > >> > ----
> > > >> > Tue Oct 12 14:28:49 UTC 2021
> > > >> >   0     0    0     0    0     0      0      0 --:--:--  0:00:01
> > > --:--:--
> > > >> >   0> GET /solr/ HTTP/1.1
> > > >> >   0     0    0     0    0     0      0      0 --:--:--  0:00:23
> > > --:--:--
> > > >> >   0< HTTP/1.1 200 OK
> > > >> > Tue Oct 12 14:29:13 UTC 2021
> > > >> > ----
> > > >> > Tue Oct 12 14:29:23 UTC 2021
> > > >> >   0     0    0     0    0     0      0      0 --:--:--  0:00:01
> > > --:--:--
> > > >> >   0> GET /solr/ HTTP/1.1
> > > >> > < HTTP/1.1 200 OK
> > > >> > Tue Oct 12 14:29:25 UTC 2021
> > > >> > ----
> > > >> > Tue Oct 12 14:29:35 UTC 2021
> > > >> >   0     0    0     0    0     0      0      0 --:--:--  0:00:03
> > > --:--:--
> > > >> >   0> GET /solr/ HTTP/1.1
> > > >> >   0     0    0     0    0     0      0      0 --:--:--  0:00:09
> > > --:--:--
> > > >> >   0< HTTP/1.1 200 OK
> > > >> > Tue Oct 12 14:29:44 UTC 2021
> > > >> > ----
> > > >> > Tue Oct 12 14:29:54 UTC 2021
> > > >> > > GET /solr/ HTTP/1.1
> > > >> >   0     0    0     0    0     0      0      0 --:--:--  0:00:11
> > > --:--:--
> > > >> >   0< HTTP/1.1 200 OK
> > > >> > Tue Oct 12 14:30:06 UTC 2021
> > > >> > ----
> > > >> > Tue Oct 12 14:30:16 UTC 2021
> > > >> > > GET /solr/ HTTP/1.1
> > > >> >   0     0    0     0    0     0      0      0 --:--:--  0:00:03
> > > --:--:--
> > > >> >   0< HTTP/1.1 200 OK
> > > >> > Tue Oct 12 14:30:20 UTC 2021
> > > >> > ----
> > > >> > Tue Oct 12 14:30:30 UTC 2021
> > > >> > > GET /solr/ HTTP/1.1
> > > >> >   0     0    0     0    0     0      0      0 --:--:--  0:00:02
> > > --:--:--
> > > >> >   0< HTTP/1.1 200 OK
> > > >> > Tue Oct 12 14:30:33 UTC 2021
> > > >> > ----
> > > >> > Tue Oct 12 14:30:43 UTC 2021
> > > >> > > GET /solr/ HTTP/1.1
> > > >> > < HTTP/1.1 200 OK
> > > >> > Tue Oct 12 14:30:43 UTC 2021
> > > >> > ----
> > > >> > Tue Oct 12 14:30:53 UTC 2021
> > > >> > > GET /solr/ HTTP/1.1
> > > >> > Tue Oct 12 14:30:55 UTC 2021
> > > >> > ----
> > > >> > Tue Oct 12 14:31:05 UTC 2021
> > > >> > > GET /solr/ HTTP/1.1
> > > >> > < HTTP/1.1 200 OK
> > > >> > Tue Oct 12 14:31:05 UTC 2021
> > > >> > ----
> > > >> > Tue Oct 12 14:31:15 UTC 2021
> > > >> > > GET /solr/ HTTP/1.1
> > > >> > < HTTP/1.1 200 OK
> > > >> > Tue Oct 12 14:31:15 UTC 2021
> > > >> > ----
> > > >> > Connection to 10.40.22.166 closed by remote host.
> > > >> >
> > > >>
> > > >
> > >
> >
>

Reply via email to