Keeping threads around for re-use would certainly make sense to me, but
when the code starts erroring because it's trying to create ever-more new
ones rather than re-use old ones, it seems there must be a bug

On Wed, 13 Oct 2021 at 15:03, Deepak Goel <deic...@gmail.com> wrote:

> We will have to check the solr logic. Threads waiting might get reused in
> the future, so might be good for all you know.
>
> Deepak
> "The greatness of a nation can be judged by the way its animals are treated
> - Mahatma Gandhi"
>
> +91 73500 12833
> deic...@gmail.com
>
> Facebook: https://www.facebook.com/deicool
> LinkedIn: www.linkedin.com/in/deicool
>
> "Plant a Tree, Go Green"
>
> Make In India : http://www.makeinindia.com/home
>
>
> On Wed, Oct 13, 2021 at 6:05 PM Dominic Humphries
> <domi...@adzuna.com.invalid> wrote:
>
> > Yep - I edited the two instances to remove those differences and re-ran a
> > fairly tortuous testing cycle - 100 requests/sec.
> >
> > Interestingly, 8.9.0 outperformed 8.3.1 significantly at this point and
> > neither hit the threads limit or failed to create new threads. But both
> > scaled up to well above 2k threads and even after stopping the test, the
> > threads remained in a timed wait condition. So I'm still thinking
> there's a
> > problem with threads not getting terminated somewhere. Test output below
> if
> > it's of use:
> >
> > 8.3.1
> > upload: ./831_tests_results.bin to
> > s3://adzuna-files-stage/test_framework/solr/tests/2021-10-13:12:20:25
> > Requests      [total, rate, throughput]  6000, 100.02, 25.05
> > Duration      [total, attack, wait]      1m29.97089157s, 59.990075461s,
> > 29.980816109s
> > Latencies     [mean, 50, 95, 99, max]    11.011900024s, 5.175300857s,
> > 30.00091013s, 30.001077592s, 30.014343222s
> > Bytes In      [total, mean]              42385846, 7064.31
> > Bytes Out     [total, mean]              0, 0.00
> > Success       [ratio]                    37.57%
> > Status Codes  [code:count]               0:3741  200:2254  500:5
> > Error Set:
> > 500 Server Error
> > Wed Oct 13 12:20:27 UTC 2021
> > upload: ./831_tests_results.bin to
> > s3://adzuna-files-stage/test_framework/solr/tests/2021-10-13:12:21:57
> > Requests      [total, rate, throughput]  6000, 100.02, 10.98
> > Duration      [total, attack, wait]      1m29.931457331s, 59.990859187s,
> > 29.940598144s
> > Latencies     [mean, 50, 95, 99, max]    7.74211412s, 57.37µs,
> > 30.000885208s, 30.00105703s, 30.011268185s
> > Bytes In      [total, mean]              21287225, 3547.87
> > Bytes Out     [total, mean]              0, 0.00
> > Success       [ratio]                    16.45%
> > Status Codes  [code:count]               0:5007  200:987  500:6
> > Error Set:
> > 500 Server Error
> > Wed Oct 13 12:21:58 UTC 2021
> > upload: ./831_tests_results.bin to
> > s3://adzuna-files-stage/test_framework/solr/tests/2021-10-13:12:23:27
> > Requests      [total, rate, throughput]  6000, 100.02, 11.68
> > Duration      [total, attack, wait]      1m29.58141878s, 59.990945585s,
> > 29.590473195s
> > Latencies     [mean, 50, 95, 99, max]    9.05339281s, 58.742µs,
> > 30.000880433s, 30.001052307s, 30.004624884s
> > Bytes In      [total, mean]              17434533, 2905.76
> > Bytes Out     [total, mean]              0, 0.00
> > Success       [ratio]                    17.43%
> > Status Codes  [code:count]               0:4874  200:1046  500:80
> > Error Set:
> > 500 Server Error
> > context deadline exceeded (Client.Timeout or context cancellation while
> > reading body)
> > Wed Oct 13 12:23:29 UTC 2021
> > upload: ./831_tests_results.bin to
> > s3://adzuna-files-stage/test_framework/solr/tests/2021-10-13:12:24:58
> > Requests      [total, rate, throughput]  6000, 100.02, 1.70
> > Duration      [total, attack, wait]      1m29.541198374s, 59.990833778s,
> > 29.550364596s
> > Latencies     [mean, 50, 95, 99, max]    8.155874514s, 51.073µs,
> > 30.000892751s, 30.001051441s, 30.009856373s
> > Bytes In      [total, mean]              2211649, 368.61
> > Bytes Out     [total, mean]              0, 0.00
> > Success       [ratio]                    2.53%
> > Status Codes  [code:count]               0:5810  200:152  500:38
> > Error Set:
> > 500 Server Error
> > Wed Oct 13 12:24:59 UTC 2021
> > upload: ./831_tests_results.bin to
> > s3://adzuna-files-stage/test_framework/solr/tests/2021-10-13:12:26:29
> > Requests      [total, rate, throughput]  6000, 100.02, 0.73
> > Duration      [total, attack, wait]      1m29.451307081s, 59.990601937s,
> > 29.460705144s
> > Latencies     [mean, 50, 95, 99, max]    7.257427504s, 50.866µs,
> > 30.000885386s, 30.001023192s, 30.009992128s
> > Bytes In      [total, mean]              837808, 139.63
> > Bytes Out     [total, mean]              0, 0.00
> > Success       [ratio]                    1.08%
> > Status Codes  [code:count]               0:5907  200:65  500:28
> > Error Set:
> > 500 Server Error
> > context deadline exceeded (Client.Timeout or context cancellation while
> > reading body)
> > Wed Oct 13 12:26:30 UTC 2021
> > upload: ./831_tests_results.bin to
> > s3://adzuna-files-stage/test_framework/solr/tests/2021-10-13:12:28:00
> > Requests      [total, rate, throughput]  6000, 100.02, 1.65
> > Duration      [total, attack, wait]      1m29.760671622s, 59.990482467s,
> > 29.770189155s
> > Latencies     [mean, 50, 95, 99, max]    8.288506559s, 50.424µs,
> > 30.000894904s, 30.00104114s, 30.016506845s
> > Bytes In      [total, mean]              1973103, 328.85
> > Bytes Out     [total, mean]              0, 0.00
> > Success       [ratio]                    2.47%
> > Status Codes  [code:count]               0:5821  200:148  500:31
> > Error Set:
> > 500 Server Error
> >
> >
> > 8.9.0
> > upload: ./890_tests_results.bin to
> > s3://adzuna-files-stage/test_framework/solr/tests/2021-10-13:12:20:27
> > Requests      [total, rate, throughput]  6000, 100.02, 28.50
> > Duration      [total, attack, wait]      1m29.930332927s, 59.990809535s,
> > 29.939523392s
> > Latencies     [mean, 50, 95, 99, max]    10.958871049s, 5.184437078s,
> > 30.000885573s, 30.001088926s, 30.00946393s
> > Bytes In      [total, mean]              51095634, 8515.94
> > Bytes Out     [total, mean]              0, 0.00
> > Success       [ratio]                    42.72%
> > Status Codes  [code:count]               0:3432  200:2563  500:5
> > Error Set:
> > 500 Server Error
> > Wed Oct 13 12:20:28 UTC 2021
> > upload: ./890_tests_results.bin to
> > s3://adzuna-files-stage/test_framework/solr/tests/2021-10-13:12:21:58
> > Requests      [total, rate, throughput]  6000, 100.02, 38.15
> > Duration      [total, attack, wait]      1m29.951119527s, 59.990310608s,
> > 29.960808919s
> > Latencies     [mean, 50, 95, 99, max]    10.089597212s, 4.856237339s,
> > 30.000863563s, 30.001063417s, 30.013413816s
> > Bytes In      [total, mean]              64455304, 10742.55
> > Bytes Out     [total, mean]              0, 0.00
> > Success       [ratio]                    57.20%
> > Status Codes  [code:count]               0:2565  200:3432  500:3
> > Error Set:
> > 500 Server Error
> > Wed Oct 13 12:22:00 UTC 2021
> > upload: ./890_tests_results.bin to
> > s3://adzuna-files-stage/test_framework/solr/tests/2021-10-13:12:23:30
> > Requests      [total, rate, throughput]  6000, 100.02, 31.12
> > Duration      [total, attack, wait]      1m29.971513653s, 59.990882146s,
> > 29.980631507s
> > Latencies     [mean, 50, 95, 99, max]    7.982882635s, 4.086936886s,
> > 30.000611417s, 30.001022921s, 30.00949772s
> > Bytes In      [total, mean]              49838902, 8306.48
> > Bytes Out     [total, mean]              0, 0.00
> > Success       [ratio]                    46.67%
> > Status Codes  [code:count]               0:3011  200:2800  500:4  503:185
> > Error Set:
> > 500 Server Error
> > unexpected EOF
> > 503 Service Unavailable
> > Wed Oct 13 12:23:31 UTC 2021
> > upload: ./890_tests_results.bin to
> > s3://adzuna-files-stage/test_framework/solr/tests/2021-10-13:12:25:01
> > Requests      [total, rate, throughput]  6000, 100.02, 30.21
> > Duration      [total, attack, wait]      1m29.871026977s, 59.990066513s,
> > 29.880960464s
> > Latencies     [mean, 50, 95, 99, max]    9.187948603s, 5.166973696s,
> > 30.000723538s, 30.001043637s, 30.009522646s
> > Bytes In      [total, mean]              52508950, 8751.49
> > Bytes Out     [total, mean]              0, 0.00
> > Success       [ratio]                    45.25%
> > Status Codes  [code:count]               0:3118  200:2715  500:4  503:163
> > Error Set:
> > 500 Server Error
> > unexpected EOF
> > 503 Service Unavailable
> > Wed Oct 13 12:25:02 UTC 2021
> > upload: ./890_tests_results.bin to
> > s3://adzuna-files-stage/test_framework/solr/tests/2021-10-13:12:26:33
> > Requests      [total, rate, throughput]  6000, 100.02, 28.33
> > Duration      [total, attack, wait]      1m29.931899376s, 59.990583341s,
> > 29.941316035s
> > Latencies     [mean, 50, 95, 99, max]    10.643567102s, 4.009658048s,
> > 30.000888301s, 30.00107338s, 30.010319307s
> > Bytes In      [total, mean]              51869384, 8644.90
> > Bytes Out     [total, mean]              0, 0.00
> > Success       [ratio]                    42.47%
> > Status Codes  [code:count]               0:3448  200:2548  500:4
> > Error Set:
> > 500 Server Error
> > context deadline exceeded (Client.Timeout or context cancellation while
> > reading body)
> > Wed Oct 13 12:26:34 UTC 2021
> > upload: ./890_tests_results.bin to
> > s3://adzuna-files-stage/test_framework/solr/tests/2021-10-13:12:28:03
> > Requests      [total, rate, throughput]  6000, 100.02, 40.46
> > Duration      [total, attack, wait]      1m29.540373624s, 59.990568262s,
> > 29.549805362s
> > Latencies     [mean, 50, 95, 99, max]    9.938485443s, 5.811191827s,
> > 30.000761953s, 30.001041833s, 30.014534484s
> > Bytes In      [total, mean]              64483027, 10747.17
> > Bytes Out     [total, mean]              0, 0.00
> > Success       [ratio]                    60.38%
> > Status Codes  [code:count]               0:2372  200:3623  500:5
> > Error Set:
> > 500 Server Error
> >
> >
> >
> > On Wed, 13 Oct 2021 at 13:06, Deepak Goel <deic...@gmail.com> wrote:
> >
> > > Hello
> > >
> > > I can as of now see two changes:
> > >
> > > 1.  -Xmx
> > >
> > > 2. +ExplicitGCInvokesConcurrent
> > >
> > > Deepak
> > > "The greatness of a nation can be judged by the way its animals are
> > treated
> > > - Mahatma Gandhi"
> > >
> > > +91 73500 12833
> > > deic...@gmail.com
> > >
> > > Facebook: https://www.facebook.com/deicool
> > > LinkedIn: www.linkedin.com/in/deicool
> > >
> > > "Plant a Tree, Go Green"
> > >
> > > Make In India : http://www.makeinindia.com/home
> > >
> > >
> > > On Wed, Oct 13, 2021 at 5:09 PM Dominic Humphries
> > > <domi...@adzuna.com.invalid> wrote:
> > >
> > > > CLI invocation for 8.3.1 is
> > > > java -server -Xmx15826m -XX:+UseG1GC -XX:+PerfDisableSharedMem
> > > > -XX:+ParallelRefProcEnabled -XX:MaxGCPauseMillis=250
> -XX:+UseLargePages
> > > > -XX:+AlwaysPreTouch
> > > >
> > > >
> > >
> >
> -Xlog:gc*:file=/srv/solr/logs/solr_gc.log:time,uptime:filecount=9,filesize=20M
> > > > -Dcom.sun.management.jmxremote
> > > > -Dcom.sun.management.jmxremote.local.only=false
> > > > -Dcom.sun.management.jmxremote.ssl=false
> > > > -Dcom.sun.management.jmxremote.authenticate=false
> > > > -Dcom.sun.management.jmxremote.port=18983
> > > > -Dcom.sun.management.jmxremote.rmi.port=18983
> > > -Dsolr.log.dir=/srv/solr/logs
> > > > -Djetty.port=8983 -DSTOP.PORT=7983 -DSTOP.KEY=solrrocks
> > > -Duser.timezone=UTC
> > > > -Djetty.home=/usr/local/solr/server -Dsolr.solr.home=/srv/solr/data
> > > > -Dsolr.data.home= -Dsolr.install.dir=/usr/local/solr
> > > >
> > >
> >
> -Dsolr.default.confdir=/usr/local/solr/server/solr/configsets/_default/conf
> > > > -Dlog4j.configurationFile=file:/srv/solr/log4j2.xml
> > > > -Dsolr.disable.shardsWhitelist=true -Xss256k
> > -Dsolr.jetty.https.port=8983
> > > > -jar start.jar --module=http
> > > > I believe the key items are:
> > > > -XX:+AlwaysPreTouch
> > > > -XX:+ParallelRefProcEnabled
> > > > -XX:+PerfDisableSharedMem
> > > > -XX:+UseG1GC
> > > > -XX:+UseLargePages
> > > > -XX:MaxGCPauseMillis=250
> > > >
> > > >
> > >
> >
> -Xlog:gc*:file=/srv/solr/logs/solr_gc.log:time,uptime:filecount=9,filesize=20M
> > > > -Xmx15826m
> > > > -Xss256k
> > > >
> > > > And for 8.9.0 is
> > > > java -server -Xmx7913m -XX:+UseG1GC -XX:+PerfDisableSharedMem
> > > > -XX:+ParallelRefProcEnabled -XX:MaxGCPauseMillis=250
> -XX:+UseLargePages
> > > > -XX:+AlwaysPreTouch -XX:+ExplicitGCInvokesConcurrent
> > > >
> > > >
> > >
> >
> -Xlog:gc*:file=/srv/solr/logs/solr_gc.log:time,uptime:filecount=9,filesize=20M
> > > > -Dsolr.jetty.inetaccess.includes= -Dsolr.jetty.inetaccess.excludes=
> > > > -Dcom.sun.management.jmxremote
> > > > -Dcom.sun.management.jmxremote.local.only=false
> > > > -Dcom.sun.management.jmxremote.ssl=false
> > > > -Dcom.sun.management.jmxremote.authenticate=false
> > > > -Dcom.sun.management.jmxremote.port=18983
> > > > -Dcom.sun.management.jmxremote.rmi.port=18983
> > > -Dsolr.log.dir=/srv/solr/logs
> > > > -Djetty.port=8983 -DSTOP.PORT=7983 -DSTOP.KEY=solrrocks
> > > -Duser.timezone=UTC
> > > > -XX:-OmitStackTraceInFastThrow
> > > > -XX:OnOutOfMemoryError=/usr/local/solr/bin/oom_solr.sh 8983
> > > /srv/solr/logs
> > > > -Djetty.home=/usr/local/solr/server -Dsolr.solr.home=/srv/solr/data
> > > > -Dsolr.data.home= -Dsolr.install.dir=/usr/local/solr
> > > >
> > >
> >
> -Dsolr.default.confdir=/usr/local/solr/server/solr/configsets/_default/conf
> > > > -Dlog4j.configurationFile=/srv/solr/log4j2.xml
> > > > -Dsolr.disable.shardsWhitelist=true -Xss256k -jar start.jar
> > --module=http
> > > > Key:
> > > > -XX:+AlwaysPreTouch
> > > > -XX:+ExplicitGCInvokesConcurrent
> > > > -XX:+ParallelRefProcEnabled
> > > > -XX:+PerfDisableSharedMem
> > > > -XX:+UseG1GC
> > > > -XX:+UseLargePages
> > > > -XX:-OmitStackTraceInFastThrow
> > > > -XX:MaxGCPauseMillis=250
> > > > -XX:OnOutOfMemoryError=/usr/local/solr/bin/oom_solr.sh 8983
> > > /srv/solr/logs
> > > >
> > > >
> > >
> >
> -Xlog:gc*:file=/srv/solr/logs/solr_gc.log:time,uptime:filecount=9,filesize=20M
> > > > -Xmx7913m
> > > > -Xss256k
> > > >
> > > > Xmx values are based on the instance RAM, currently they're running
> on
> > > two
> > > > different instance types but we see the same behaviour when they're
> on
> > > > identical types too.
> > > >
> > > > Many thanks
> > > >
> > > > Dominic
> > > >
> > > > On Wed, 13 Oct 2021 at 12:07, Deepak Goel <deic...@gmail.com> wrote:
> > > >
> > > > > Hello
> > > > >
> > > > > Can you please tell us the JVM Heap Setting for both the versions:
> > > 8.3.1,
> > > > > 8.9.0?
> > > > >
> > > > > I will also have to look into the following code:
> > > > FileFloatSource.java:210.
> > > > > (will do it tonite-IST and update)
> > > > >
> > > > > Deepak
> > > > > "The greatness of a nation can be judged by the way its animals are
> > > > treated
> > > > > - Mahatma Gandhi"
> > > > >
> > > > > +91 73500 12833
> > > > > deic...@gmail.com
> > > > >
> > > > > Facebook: https://www.facebook.com/deicool
> > > > > LinkedIn: www.linkedin.com/in/deicool
> > > > >
> > > > > "Plant a Tree, Go Green"
> > > > >
> > > > > Make In India : http://www.makeinindia.com/home
> > > > >
> > > > >
> > > > > On Wed, Oct 13, 2021 at 4:06 PM Dominic Humphries
> > > > > <domi...@adzuna.com.invalid> wrote:
> > > > >
> > > > > > Oh, that's very helpful to know about, ty
> > > > > >
> > > > > > The overwhelming majority appear to be threads in TIMED_WAITING,
> > all
> > > > > > waiting on the same
> > > > > > thing:
> > > > >
> > > >
> > >
> >
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@3b315cbb
> > > > > >
> > > > > > I've attached a screenshot which includes the stack trace.
> Stopping
> > > all
> > > > > > queries to the instance and waiting didn't result in any
> noticeable
> > > > > > decrease in the number of threads so it looks like despite being
> > > timed,
> > > > > > they're simply not getting terminated.
> > > > > >
> > > > > > Restarting the service takes me back down to just 53 threads;
> > > > re-running
> > > > > a
> > > > > > test results in many new threads immediately coming into being,
> > this
> > > > time
> > > > > > with a higher proportion of threads BLOCKED on
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.solr.search.function.FileFloatSource$CreationPlaceholder@37b782de
> > > > > > - See second screenshot. The stack trace for those is too big for
> > one
> > > > > > screen so here's the output:
> > > > > >
> > > > > > qtp178604517-861 (861)
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.solr.search.function.FileFloatSource$CreationPlaceholder@37b782de
> > > > > >
> > > > > >    -
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.solr.search.function.FileFloatSource$Cache.get(FileFloatSource.java:210)
> > > > > >    -
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.solr.search.function.FileFloatSource.getCachedFloats(FileFloatSource.java:158)
> > > > > >    -
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.solr.search.function.FileFloatSource.getValues(FileFloatSource.java:97)
> > > > > >    -
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.lucene.queries.function.ValueSource$WrappedDoubleValuesSource.getValues(ValueSource.java:203)
> > > > > >    -
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.lucene.queries.function.FunctionScoreQuery$MultiplicativeBoostValuesSource.getValues(FunctionScoreQuery.java:261)
> > > > > >    -
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.lucene.queries.function.FunctionScoreQuery$FunctionScoreWeight.scorer(FunctionScoreQuery.java:224)
> > > > > >    -
> > org.apache.lucene.search.Weight.scorerSupplier(Weight.java:148)
> > > > > >    -
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.lucene.search.BooleanWeight.scorerSupplier(BooleanWeight.java:379)
> > > > > >    -
> > > > >
> org.apache.lucene.search.BooleanWeight.scorer(BooleanWeight.java:344)
> > > > > >    - org.apache.lucene.search.Weight.bulkScorer(Weight.java:182)
> > > > > >    -
> > > > > >
> > > > >
> > >
> org.apache.lucene.search.BooleanWeight.bulkScorer(BooleanWeight.java:338)
> > > > > >    -
> > > > >
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:656)
> > > > > >    -
> > > > >
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:443)
> > > > > >    -
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.solr.search.SolrIndexSearcher.buildAndRunCollectorChain(SolrIndexSearcher.java:211)
> > > > > >    -
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.solr.search.SolrIndexSearcher.getDocListAndSetNC(SolrIndexSearcher.java:1705)
> > > > > >    -
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1408)
> > > > > >    -
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:596)
> > > > > >    -
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.solr.handler.component.QueryComponent.doProcessUngroupedSearch(QueryComponent.java:1500)
> > > > > >    -
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:390)
> > > > > >    -
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:369)
> > > > > >    -
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:216)
> > > > > >    - org.apache.solr.core.SolrCore.execute(SolrCore.java:2637)
> > > > > >    -
> > > > org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:794)
> > > > > >    -
> > org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:567)
> > > > > >    -
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:427)
> > > > > >    -
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:357)
> > > > > >    -
> > > > > >
> > > >
> org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:201)
> > > > > >    -
> > > > > >
> > > > >
> > > >
> > >
> >
> org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1601)
> > > > > >    -
> > > > > >
> > > > >
> > > >
> > >
> >
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:548)
> > > > > >    -
> > > > > >
> > > > >
> > > >
> > >
> >
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
> > > > > >    -
> > > > > >
> > > > >
> > > >
> > >
> >
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:602)
> > > > > >    -
> > > > > >
> > > > >
> > > >
> > >
> >
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
> > > > > >    -
> > > > > >
> > > > >
> > > >
> > >
> >
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)
> > > > > >    -
> > > > > >
> > > > >
> > > >
> > >
> >
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1624)
> > > > > >    -
> > > > > >
> > > > >
> > > >
> > >
> >
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
> > > > > >    -
> > > > > >
> > > > >
> > > >
> > >
> >
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1435)
> > > > > >    -
> > > > > >
> > > > >
> > > >
> > >
> >
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
> > > > > >    -
> > > > > >
> > > > >
> > >
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:501)
> > > > > >    -
> > > > > >
> > > > >
> > > >
> > >
> >
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1594)
> > > > > >    -
> > > > > >
> > > > >
> > > >
> > >
> >
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
> > > > > >    -
> > > > > >
> > > > >
> > > >
> > >
> >
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1350)
> > > > > >    -
> > > > > >
> > > > >
> > > >
> > >
> >
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
> > > > > >    -
> > > > > >
> > > > >
> > > >
> > >
> >
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:191)
> > > > > >    -
> > > > > >
> > > > >
> > > >
> > >
> >
> org.eclipse.jetty.server.handler.InetAccessHandler.handle(InetAccessHandler.java:177)
> > > > > >    -
> > > > > >
> > > > >
> > > >
> > >
> >
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:146)
> > > > > >    -
> > > > > >
> > > > >
> > > >
> > >
> >
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
> > > > > >    -
> > > > > >
> > > > >
> > > >
> > >
> >
> org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:322)
> > > > > >    -
> > > > > >
> > > > >
> > > >
> > >
> >
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
> > > > > >    - org.eclipse.jetty.server.Server.handle(Server.java:516)
> > > > > >    -
> > > > > >
> > > > >
> > > >
> > >
> >
> org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:388)
> > > > > >    -
> > > > >
> > > >
> > >
> >
> org.eclipse.jetty.server.HttpChannel$$Lambda$556/0x000000080067a440.dispatch(Unknown
> > > > > >    Source)
> > > > > >    -
> > > > org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:633)
> > > > > >    -
> > > org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:380)
> > > > > >    -
> > > > > >
> > > > >
> > > >
> > >
> >
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277)
> > > > > >    -
> > > > > >    org.eclipse.jetty.io
> > > > >
> > .AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
> > > > > >    - org.eclipse.jetty.io
> > > .FillInterest.fillable(FillInterest.java:105)
> > > > > >    - org.eclipse.jetty.io
> > > > > .ChannelEndPoint$1.run(ChannelEndPoint.java:104)
> > > > > >    -
> > > > > >
> > > > >
> > > >
> > >
> >
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:336)
> > > > > >    -
> > > > > >
> > > > >
> > > >
> > >
> >
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:313)
> > > > > >    -
> > > > > >
> > > > >
> > > >
> > >
> >
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171)
> > > > > >    -
> > > > > >
> > > > >
> > > >
> > >
> >
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:129)
> > > > > >    -
> > > > > >
> > > > >
> > > >
> > >
> >
> org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:383)
> > > > > >    -
> > > > > >
> > > > >
> > > >
> > >
> >
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:882)
> > > > > >    -
> > > > > >
> > > > >
> > > >
> > >
> >
> org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1036)
> > > > > >    - java.base@11.0.5/java.lang.Thread.run(Thread.java:834)
> > > > > >
> > > > > > [image: image.png]
> > > > > > [image: image.png]
> > > > > >
> > > > > > On Wed, 13 Oct 2021 at 00:03, Joel Bernstein <joels...@gmail.com
> >
> > > > wrote:
> > > > > >
> > > > > >> There is a thread dump on the Solr admin. You can use that to
> > > > determine
> > > > > >> what all those threads are doing and where they are getting
> stuck.
> > > You
> > > > > can
> > > > > >> post parts of the thread dump back to this email thread as well.
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >> Joel Bernstein
> > > > > >> http://joelsolr.blogspot.com/
> > > > > >>
> > > > > >>
> > > > > >> On Tue, Oct 12, 2021 at 11:15 AM Dominic Humphries
> > > > > >> <domi...@adzuna.com.invalid> wrote:
> > > > > >>
> > > > > >> > We run 8.3.1 in prod without any problems, but we're having
> > issues
> > > > > with
> > > > > >> > trying to upgrade.
> > > > > >> >
> > > > > >> > I've created an 8.9.0 leader & follower, imported our live
> data
> > > into
> > > > > it,
> > > > > >> > and am testing it via replaying requests made to prod. We're
> > > seeing
> > > > a
> > > > > >> big
> > > > > >> > problem where fairly moderate request rates are causing the
> > > instance
> > > > > to
> > > > > >> > become so slow it fails healthcheck. The logs showed a lot of
> > > errors
> > > > > >> around
> > > > > >> > creating threads:
> > > > > >> >
> > > > > >> > solr[4507]: [124136.511s][warning][os,thread] Failed to start
> > > > thread -
> > > > > >> > pthread_create failed (EAGAIN) for attributes: stacksize:
> 256k,
> > > > > >> guardsize:
> > > > > >> > 0k, detached.
> > > > > >> >
> > > > > >> > WARN  (qtp178604517-3891) [   ] o.e.j.i.ManagedSelector  =>
> > > > > >> > java.lang.OutOfMemoryError: unable to create native thread:
> > > possibly
> > > > > >> out of
> > > > > >> > memory or process/resource limits reached
> > > > > >> >
> > > > > >> > So I monitored thread count for the process whilst running the
> > > test
> > > > > >> suite
> > > > > >> > and saw a persistent pattern: Threads increased until maxed
> out,
> > > the
> > > > > >> logs
> > > > > >> > flooded with errors as it tried to create still more threads,
> > and
> > > > the
> > > > > >> > instance slowed down until terminated as unhealthy.
> > > > > >> >
> > > > > >> > The DefaultTasksMax is set to 4915, I've tried raising and
> > > lowering
> > > > it
> > > > > >> but
> > > > > >> > regardless of value the result is the same: it gets maxed and
> > > > > everything
> > > > > >> > slows down.
> > > > > >> >
> > > > > >> > Is there anything I can do to stop solr spinning up so many
> > > threads
> > > > it
> > > > > >> > ceases to function? There have been a few test passes where it
> > > > > >> > spontaneously dropped threadcount from thousands to hundreds
> and
> > > > > stayed
> > > > > >> up
> > > > > >> > longer, but there seems no pattern to when this happens.
> Running
> > > the
> > > > > >> same
> > > > > >> > tests on 8.3.1 results in a much slower increase in threads
> and
> > it
> > > > > never
> > > > > >> > quite maxes them so things continue to function.
> > > > > >> >
> > > > > >> > See below for the thread count and healthcheck times seen on a
> > > > (fairly
> > > > > >> > harsh) test run of 100 requests/sec
> > > > > >> >
> > > > > >> > Thanks
> > > > > >> >
> > > > > >> > Dominic
> > > > > >> >
> > > > > >> >
> > > > > >> > Threadcount:
> > > > > >> >
> > > > > >> > ubuntu@ip-10-40-22-166:~$ while [ 1 ]; do date; ps -eLF |
> grep
> > > > > >> 'start.jar'
> > > > > >> > | wc -l; sleep 10s; done
> > > > > >> > Tue Oct 12 14:27:33 UTC 2021
> > > > > >> > 52
> > > > > >> > Tue Oct 12 14:27:43 UTC 2021
> > > > > >> > 52
> > > > > >> > Tue Oct 12 14:27:54 UTC 2021
> > > > > >> > 52
> > > > > >> > Tue Oct 12 14:28:04 UTC 2021
> > > > > >> > 52
> > > > > >> > Tue Oct 12 14:28:14 UTC 2021
> > > > > >> > 569
> > > > > >> > Tue Oct 12 14:28:24 UTC 2021
> > > > > >> > 899
> > > > > >> > Tue Oct 12 14:28:34 UTC 2021
> > > > > >> > 1198
> > > > > >> > Tue Oct 12 14:28:44 UTC 2021
> > > > > >> > 1589
> > > > > >> > Tue Oct 12 14:28:54 UTC 2021
> > > > > >> > 2016
> > > > > >> > Tue Oct 12 14:29:05 UTC 2021
> > > > > >> > 2451
> > > > > >> > Tue Oct 12 14:29:15 UTC 2021
> > > > > >> > 2851
> > > > > >> > Tue Oct 12 14:29:26 UTC 2021
> > > > > >> > 2934
> > > > > >> > Tue Oct 12 14:29:36 UTC 2021
> > > > > >> > 3249
> > > > > >> > Tue Oct 12 14:29:46 UTC 2021
> > > > > >> > 3501
> > > > > >> > Tue Oct 12 14:29:57 UTC 2021
> > > > > >> > 3734
> > > > > >> > Tue Oct 12 14:30:07 UTC 2021
> > > > > >> > 4128
> > > > > >> > Tue Oct 12 14:30:18 UTC 2021
> > > > > >> > 4374
> > > > > >> > Tue Oct 12 14:30:29 UTC 2021
> > > > > >> > 4637
> > > > > >> > Tue Oct 12 14:30:39 UTC 2021
> > > > > >> > 4693
> > > > > >> > Tue Oct 12 14:30:50 UTC 2021
> > > > > >> > 4807
> > > > > >> > Tue Oct 12 14:31:01 UTC 2021
> > > > > >> > 4916
> > > > > >> > Tue Oct 12 14:31:11 UTC 2021
> > > > > >> > 4916
> > > > > >> > Tue Oct 12 14:31:22 UTC 2021
> > > > > >> > Connection to 10.40.22.166 closed by remote host.
> > > > > >> >
> > > > > >> >
> > > > > >> > Healthcheck:
> > > > > >> >
> > > > > >> > ubuntu@ip-10-40-22-166:~$ while [ 1 ]; do date; curl -v
> > > > > >> > localhost:8983/solr/ 2>&1 | grep HTTP; date; echo '----';
> sleep
> > > > > >> > 10s; done
> > > > > >> > Tue Oct 12 14:27:34 UTC 2021
> > > > > >> > > GET /solr/ HTTP/1.1
> > > > > >> > < HTTP/1.1 200 OK
> > > > > >> > Tue Oct 12 14:27:34 UTC 2021
> > > > > >> > ----
> > > > > >> > Tue Oct 12 14:27:44 UTC 2021
> > > > > >> > > GET /solr/ HTTP/1.1
> > > > > >> > < HTTP/1.1 200 OK
> > > > > >> > Tue Oct 12 14:27:44 UTC 2021
> > > > > >> > ----
> > > > > >> > Tue Oct 12 14:27:54 UTC 2021
> > > > > >> > > GET /solr/ HTTP/1.1
> > > > > >> > < HTTP/1.1 200 OK
> > > > > >> > Tue Oct 12 14:27:54 UTC 2021
> > > > > >> > ----
> > > > > >> > Tue Oct 12 14:28:04 UTC 2021
> > > > > >> > > GET /solr/ HTTP/1.1
> > > > > >> > < HTTP/1.1 200 OK
> > > > > >> > Tue Oct 12 14:28:04 UTC 2021
> > > > > >> > ----
> > > > > >> > Tue Oct 12 14:28:14 UTC 2021
> > > > > >> > > GET /solr/ HTTP/1.1
> > > > > >> >   0     0    0     0    0     0      0      0 --:--:--
> 0:00:02
> > > > > --:--:--
> > > > > >> >   0< HTTP/1.1 200 OK
> > > > > >> > Tue Oct 12 14:28:16 UTC 2021
> > > > > >> > ----
> > > > > >> > Tue Oct 12 14:28:26 UTC 2021
> > > > > >> > > GET /solr/ HTTP/1.1
> > > > > >> >   0     0    0     0    0     0      0      0 --:--:--
> 0:00:12
> > > > > --:--:--
> > > > > >> >   0< HTTP/1.1 200 OK
> > > > > >> > Tue Oct 12 14:28:39 UTC 2021
> > > > > >> > ----
> > > > > >> > Tue Oct 12 14:28:49 UTC 2021
> > > > > >> >   0     0    0     0    0     0      0      0 --:--:--
> 0:00:01
> > > > > --:--:--
> > > > > >> >   0> GET /solr/ HTTP/1.1
> > > > > >> >   0     0    0     0    0     0      0      0 --:--:--
> 0:00:23
> > > > > --:--:--
> > > > > >> >   0< HTTP/1.1 200 OK
> > > > > >> > Tue Oct 12 14:29:13 UTC 2021
> > > > > >> > ----
> > > > > >> > Tue Oct 12 14:29:23 UTC 2021
> > > > > >> >   0     0    0     0    0     0      0      0 --:--:--
> 0:00:01
> > > > > --:--:--
> > > > > >> >   0> GET /solr/ HTTP/1.1
> > > > > >> > < HTTP/1.1 200 OK
> > > > > >> > Tue Oct 12 14:29:25 UTC 2021
> > > > > >> > ----
> > > > > >> > Tue Oct 12 14:29:35 UTC 2021
> > > > > >> >   0     0    0     0    0     0      0      0 --:--:--
> 0:00:03
> > > > > --:--:--
> > > > > >> >   0> GET /solr/ HTTP/1.1
> > > > > >> >   0     0    0     0    0     0      0      0 --:--:--
> 0:00:09
> > > > > --:--:--
> > > > > >> >   0< HTTP/1.1 200 OK
> > > > > >> > Tue Oct 12 14:29:44 UTC 2021
> > > > > >> > ----
> > > > > >> > Tue Oct 12 14:29:54 UTC 2021
> > > > > >> > > GET /solr/ HTTP/1.1
> > > > > >> >   0     0    0     0    0     0      0      0 --:--:--
> 0:00:11
> > > > > --:--:--
> > > > > >> >   0< HTTP/1.1 200 OK
> > > > > >> > Tue Oct 12 14:30:06 UTC 2021
> > > > > >> > ----
> > > > > >> > Tue Oct 12 14:30:16 UTC 2021
> > > > > >> > > GET /solr/ HTTP/1.1
> > > > > >> >   0     0    0     0    0     0      0      0 --:--:--
> 0:00:03
> > > > > --:--:--
> > > > > >> >   0< HTTP/1.1 200 OK
> > > > > >> > Tue Oct 12 14:30:20 UTC 2021
> > > > > >> > ----
> > > > > >> > Tue Oct 12 14:30:30 UTC 2021
> > > > > >> > > GET /solr/ HTTP/1.1
> > > > > >> >   0     0    0     0    0     0      0      0 --:--:--
> 0:00:02
> > > > > --:--:--
> > > > > >> >   0< HTTP/1.1 200 OK
> > > > > >> > Tue Oct 12 14:30:33 UTC 2021
> > > > > >> > ----
> > > > > >> > Tue Oct 12 14:30:43 UTC 2021
> > > > > >> > > GET /solr/ HTTP/1.1
> > > > > >> > < HTTP/1.1 200 OK
> > > > > >> > Tue Oct 12 14:30:43 UTC 2021
> > > > > >> > ----
> > > > > >> > Tue Oct 12 14:30:53 UTC 2021
> > > > > >> > > GET /solr/ HTTP/1.1
> > > > > >> > Tue Oct 12 14:30:55 UTC 2021
> > > > > >> > ----
> > > > > >> > Tue Oct 12 14:31:05 UTC 2021
> > > > > >> > > GET /solr/ HTTP/1.1
> > > > > >> > < HTTP/1.1 200 OK
> > > > > >> > Tue Oct 12 14:31:05 UTC 2021
> > > > > >> > ----
> > > > > >> > Tue Oct 12 14:31:15 UTC 2021
> > > > > >> > > GET /solr/ HTTP/1.1
> > > > > >> > < HTTP/1.1 200 OK
> > > > > >> > Tue Oct 12 14:31:15 UTC 2021
> > > > > >> > ----
> > > > > >> > Connection to 10.40.22.166 closed by remote host.
> > > > > >> >
> > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to