[ 
https://issues.apache.org/jira/browse/SOLR-9135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15295149#comment-15295149
 ] 

Shawn Heisey commented on SOLR-9135:
------------------------------------

[~ronbraun], If you log on as the user that is running Solr and type "uname -a" 
yourself within the container, does it return quickly?   If not, there may be 
some problem specific to your setup.

Even if the root of the problem you are seeing is unique to your setup and you 
can fix it outside of Solr, running external processes (which the 
SystemInfoHandler currently does) seems like a *really* bad idea in general.

For Linux, /proc/version and /proc/runtime can provide nearly identical 
information to that requested by running external processes.  If the processor 
architecture is desired, there are likely other /proc endpoints that can be 
used.  These /proc files likely won't work on free operating systems other than 
Linux, or genetic UNIX systems like Solaris.

{code:none}
elyograg@sauron:~$ uname -a
Linux sauron 4.2.0-35-generic #40~14.04.1-Ubuntu SMP Fri Mar 18 16:37:35 UTC 
2016 x86_64 x86_64 x86_64 GNU/Linux
elyograg@sauron:~$ cat /proc/version
Linux version 4.2.0-35-generic (buildd@lgw01-58) (gcc version 4.8.2 (Ubuntu 
4.8.2-19ubuntu1) ) #40~14.04.1-Ubuntu SMP Fri Mar 18 16:37:35 UTC 2016
elyograg@sauron:~$ cat /proc/uptime
3326236.35 9503419.17
{code}


> SystemInfoHandler can poison / consume Jetty thread pool
> --------------------------------------------------------
>
>                 Key: SOLR-9135
>                 URL: https://issues.apache.org/jira/browse/SOLR-9135
>             Project: Solr
>          Issue Type: Bug
>         Environment: Solr 6.0.0
>            Reporter: Ronald Braun
>            Priority: Minor
>
> We are running solr 6.0.0 in solr cloud mode within a docker container.  We 
> encountered an issue whereby the SystemInfoHandler was forking out processes 
> that would immediately enter D (uninterruputable sleep) due to a container 
> volume issue after hitting the admin manager in a browser.  The thread stays 
> in runnable state:
> {noformat}
> "qtp43368234-13611" #13611 prio=5 os_prio=0 tid=0x00007f0260011800 nid=0x36fb 
> ru
> nnable [0x00007efa0bce1000]
>    java.lang.Thread.State: RUNNABLE
>         at java.lang.UNIXProcess.forkAndExec(Native Method)
>         at java.lang.UNIXProcess.<init>(UNIXProcess.java:248)
>         at java.lang.ProcessImpl.start(ProcessImpl.java:134)
>         at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
>         at java.lang.Runtime.exec(Runtime.java:620)
>         at java.lang.Runtime.exec(Runtime.java:450)
>         at java.lang.Runtime.exec(Runtime.java:347)
>         at 
> org.apache.solr.handler.admin.SystemInfoHandler.execute(SystemInfoHan
> dler.java:244)
>         at 
> org.apache.solr.handler.admin.SystemInfoHandler.getSystemInfo(SystemI
> nfoHandler.java:198)
>         at 
> org.apache.solr.handler.admin.SystemInfoHandler.handleRequestBody(Sys
> temInfoHandler.java:111)
>         at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandl
> erBase.java:155)
>         at 
> org.apache.solr.handler.admin.InfoHandler.handleRequestBody(InfoHandl
> er.java:86)
>         at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandl
> erBase.java:155)
>         at 
> org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.
> java:658)
>         at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:441)
>         at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilte
> r.java:229)
>         at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilte
> r.java:184)
>         at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(Servlet
> Handler.java:1668)
>         at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java
> :581)
>         at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.j
> ava:143)
>         at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.jav
> a:548)
>         at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandl
> er.java:226)
>         at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandl
> er.java:1160)
>         at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:
> 511)
>         at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandle
> r.java:185)
>         at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandle
> r.java:1092)
>         at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.j
> ava:141)
>         at 
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(Cont
> extHandlerCollection.java:213)
>         at 
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerColl
> ection.java:119)
>         at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper
> .java:134)
>         at org.eclipse.jetty.server.Server.handle(Server.java:518)
>         at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:308)
>         at 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.jav
> a:244)
>         at 
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(Abstra
> ctConnection.java:273)
>         at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
>         at 
> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoin
> t.java:93)
>         at 
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceA
> ndRun(ExecuteProduceConsume.java:246)
>         at 
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(Exec
> uteProduceConsume.java:156)
>         at 
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPoo
> l.java:654)
>         at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool
> .java:572)
>         at java.lang.Thread.run(Thread.java:745)
> {noformat}
> The problematic command being executed was 'uname -a'.  The admin manager 
> would throw up a "Lost connection to solr" message but presumably retries the 
> connection periodically (at least a couple of times a minute).  Before we 
> figured out what was going on, we had 600+ threads in D state:
> {noformat}
> 4433 solr 20 0 0.399t 0.105t 0.057t D 0.0 86.2 0:00.00 
> /usr/lib/jvm/java-8-oracle/bin/java -server -Xms48g -Xmx48g -XX:NewRatio=3 
> -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThre+
> 4434 solr 20 0 0.399t 0.105t 0.057t D 0.0 86.2 0:00.00 
> /usr/lib/jvm/java-8-oracle/bin/java -server -Xms48g -Xmx48g -XX:NewRatio=3 
> -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThre+
> 4439 solr 20 0 0.399t 0.105t 0.057t D 0.0 86.2 0:00.00 
> /usr/lib/jvm/java-8-oracle/bin/java -server -Xms48g -Xmx48g -XX:NewRatio=3 
> -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThre+
> 4440 solr 20 0 0.399t 0.105t 0.057t D 0.0 86.2 0:00.04 
> /usr/lib/jvm/java-8-oracle/bin/java -server -Xms48g -Xmx48g -XX:NewRatio=3 
> -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThre+
> 4461 solr 20 0 0.399t 0.105t 0.057t D 0.0 86.2 0:00.00 
> /usr/lib/jvm/java-8-oracle/bin/java -server -Xms48g -Xmx48g -XX:NewRatio=3 
> -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThre+
> 4462 solr 20 0 0.399t 0.105t 0.057t D 0.0 86.2 0:00.00 
> /usr/lib/jvm/java-8-oracle/bin/java -server -Xms48g -Xmx48g -XX:NewRatio=3 
> -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThre+
> 4467 solr 20 0 0.399t 0.105t 0.057t D 0.0 86.2 0:00.00 
> /usr/lib/jvm/java-8-oracle/bin/java -server -Xms48g -Xmx48g -XX:NewRatio=3 
> -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThre+
> 4470 solr 20 0 0.399t 0.105t 0.057t D 0.0 86.2 0:00.00 
> /usr/lib/jvm/java-8-oracle/bin/java -server -Xms48g -Xmx48g -XX:NewRatio=3 
> -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThre+
> 4486 solr 20 0 0.399t 0.105t 0.057t D 0.0 86.2 0:00.00 
> /usr/lib/jvm/java-8-oracle/bin/java -server -Xms48g -Xmx48g -XX:NewRatio=3 
> -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThre+
> 4487 solr 20 0 0.399t 0.105t 0.057t D 0.0 86.2 0:00.06 
> /usr/lib/jvm/java-8-oracle/bin/java -server -Xms48g -Xmx48g -XX:NewRatio=3 
> -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThre+
> 4488 solr 20 0 0.399t 0.105t 0.057t D 0.0 86.2 0:00.00 
> /usr/lib/jvm/java-8-oracle/bin/java -server -Xms48g -Xmx48g -XX:NewRatio=3 
> -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThre+
> 4489 solr 20 0 0.399t 0.105t 0.057t D 0.0 86.2 0:00.00 
> /usr/lib/jvm/java-8-oracle/bin/java -server -Xms48g -Xmx48g -XX:NewRatio=3 
> -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThre+
> 4496 solr 20 0 0.399t 0.105t 0.057t D 0.0 86.2 0:00.00 
> /usr/lib/jvm/java-8-oracle/bin/java -server -Xms48g -Xmx48g -XX:NewRatio=3 
> -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThre+
> 4497 solr 20 0 0.399t 0.105t 0.057t D 0.0 86.2 0:00.00 
> /usr/lib/jvm/java-8-oracle/bin/java -server -Xms48g -Xmx48g -XX:NewRatio=3 
> -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThre+
> 4501 solr 20 0 0.399t 0.105t 0.057t D 0.0 86.2 0:00.00 
> /usr/lib/jvm/java-8-oracle/bin/java -server -Xms48g -Xmx48g -XX:NewRatio=3 
> -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThre+
> etc.
> {noformat}
> An OS exec call is a bit heavy for loading the admin page...  Might you 
> consider either:
> - load this info once at startup and store
> - use a collapsed panel for display and fetch only on expansion / request



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to