Hi, One thing I have noticed is that if I keep these servers ideal (move request to another infra) then the searcher gets closed after a few minutes. so somehow incoming traffic is responsible for the searcher not getting closed.
This particular request took almost 6 hours and only got closed when I diverted the traffic to another infra . https://drive.google.com/file/d/197QFkNNsbkhOL57lVn0EkPe6FEzKkWFL/view?usp=share_link On Thu, Dec 22, 2022 at 3:19 PM Satya Nand <satya.n...@indiamart.com> wrote: > Hi Dominique, > > I looked at the stack trace but I couldn't know for sure why the thread is > waiting. can anyone help me in decoding this? > > httpShardExecutor-7-thread-939362-processing-x:im-search-03-08-22_shard1_replica_p17 > r:core_node18 http://// > 10.128.193.11:8985//solr//im-search-03-08-22_shard1_replica_p17//|http:////10.128.99.14:8985//solr//im-search-03-08-22_shard1_replica_n1// > <http://10.128.193.11:8985//solr//im-search-03-08-22_shard1_replica_p17//%7Chttp:////10.128.99.14:8985//solr//im-search-03-08-22_shard1_replica_n1//> > n:10.128.193.11:8985_solr c:im-search-03-08-22 s:shard1 [http://// > 10.128.193.11:8985//solr//im-search-03-08-22_shard1_replica_p17//, http:// > //10.128.99.14:8985//solr//im-search-03-08-22_shard1_replica_n1//] > > PRIORITY : 5 > > THREAD ID : 0X00007FE6180494C0 > > NATIVE ID : 0X54E3 > > NATIVE ID (DECIMAL) : 21731 > > STATE : WAITING > > stackTrace: > java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(java.base@15.0.2/Native Method) > - waiting on <no object reference available> > at java.lang.Object.wait(java.base@15.0.2/Object.java:321) > at org.eclipse.jetty.client.util.InputStreamResponseListener$Input.read( > InputStreamResponseListener.java:318) > - locked <0x000000054cd27c88> (a > org.eclipse.jetty.client.util.InputStreamResponseListener) > at org.apache.solr.common.util.FastInputStream.readWrappedStream( > FastInputStream.java:90) > at org.apache.solr.common.util.FastInputStream.refill( > FastInputStream.java:99) > at org.apache.solr.common.util.FastInputStream.readByte( > FastInputStream.java:217) > at org.apache.solr.common.util.JavaBinCodec._init(JavaBinCodec.java:211) > at org.apache.solr.common.util.JavaBinCodec.initRead(JavaBinCodec.java:202 > ) > at org.apache.solr.common.util.JavaBinCodec.unmarshal( > JavaBinCodec.java:195) > at org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse( > BinaryResponseParser.java:51) > at > org.apache.solr.client.solrj.impl.Http2SolrClient.processErrorsAndResponse( > Http2SolrClient.java:711) > at org.apache.solr.client.solrj.impl.Http2SolrClient.request( > Http2SolrClient.java:421) > at org.apache.solr.client.solrj.impl.Http2SolrClient.request( > Http2SolrClient.java:776) > at org.apache.solr.client.solrj.impl.LBSolrClient.doRequest( > LBSolrClient.java:369) > at org.apache.solr.client.solrj.impl.LBSolrClient.request( > LBSolrClient.java:297) > at > org.apache.solr.handler.component.HttpShardHandlerFactory.makeLoadBalancedRequest( > HttpShardHandlerFactory.java:371) > at org.apache.solr.handler.component.ShardRequestor.call( > ShardRequestor.java:132) > at org.apache.solr.handler.component.ShardRequestor.call( > ShardRequestor.java:41) > at java.util.concurrent.FutureTask.run(java.base@ > 15.0.2/FutureTask.java:264) > at java.util.concurrent.Executors$RunnableAdapter.call(java.base@ > 15.0.2/Executors.java:515) > at java.util.concurrent.FutureTask.run(java.base@ > 15.0.2/FutureTask.java:264) > at > com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run( > InstrumentedExecutorService.java:180) > at > org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0( > ExecutorUtil.java:218) > at > org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$$Lambda$269/0x00000008010566b0.run(Unknown > Source) > at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@ > 15.0.2/ThreadPoolExecutor.java:1130) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@ > 15.0.2/ThreadPoolExecutor.java:630) > at java.lang.Thread.run(java.base@15.0.2/Thread.java:832) > > On Sun, Dec 18, 2022 at 3:46 PM Dominique Bejean < > dominique.bej...@eolya.fr> wrote: > >> Hi, >> >> May be a thread dump and a heap dump can help to find where and why this >> request is blocked ? >> May be just by finding this thread in the Solr console, you can see where >> the thread is blocked ? >> >> Regards >> >> Dominique >> >> >> Le dim. 18 déc. 2022 à 09:10, Satya Nand <satya.n...@indiamart.com >> .invalid> >> a écrit : >> >> > Pinging on this thread again to bring it to the top. >> > >> > Any idea why one request is stuck for hours in solr cloud.? >> > >> > On Fri, Dec 9, 2022 at 3:35 PM Satya Nand <satya.n...@indiamart.com> >> > wrote: >> > >> > > Hi Ere, >> > > >> > > We tried executing this request again and it didn't take any time. So >> it >> > > is not repeatable. average response time of all the queries around >> this >> > > period was only approx 100-200 ms. >> > > >> > > This was a group=true request where we get 14 groups and 5 results per >> > > group. So no deep pagination. >> > > >> > > On Fri, Dec 9, 2022 at 2:04 PM Ere Maijala <ere.maij...@helsinki.fi> >> > > wrote: >> > > >> > >> Hi, >> > >> >> > >> Are the same requests sometimes stalling and sometimes fast, or is it >> > >> some particular queries that take hours? >> > >> >> > >> There are some things you should avoid with SolrCloud, and deep >> paging >> > >> (i.e. a large number for the start or rows parameter) is a typical >> issue >> > >> (see e.g. https://yonik.com/solr/paging-and-deep-paging/ for more >> > >> information). >> > >> >> > >> Best, >> > >> Ere >> > >> >> > >> Satya Nand kirjoitti 8.12.2022 klo 13.27: >> > >> > Hi, >> > >> > >> > >> > Greetings for the day, >> > >> > >> > >> > We are facing a strange problem in Solr cloud where a few requests >> are >> > >> > taking hours to complete. Some requests return with a 0 status code >> > and >> > >> > some with a 500 status code. The recent request took more than 5 >> hours >> > >> to >> > >> > complete with only a 9k results count. >> > >> > >> > >> > >> > >> > These queries create problems in closing old searchers, Some times >> > >> there >> > >> > are 3-4 searchers where one is a new searcher and the others are >> just >> > >> stuck >> > >> > because a few queries are tracking hours. Finally, the application >> > slows >> > >> > down horribly, and the load increases. >> > >> > >> > >> > I have downloaded the stack trace of the affected node and tried to >> > >> analyze >> > >> > this stack trace online. but I couldn't get many insights from it. >> > >> > . >> > >> > >> > >> > Stack Trace: >> > >> > >> > >> > >> > >> >> > >> https://fastthread.io/my-thread-report.jsp?p=c2hhcmVkLzIwMjIvMTIvOC9sb2dzLnR4dC0tMTAtNTUtMzA=& >> > >> > >> > >> > JVM Settings: We are using Parallel GC, can this be causing this >> much >> > >> log >> > >> > pause? >> > >> > >> > >> > -XX:+UseParallelGC >> > >> > -XX:-OmitStackTraceInFastThrow >> > >> > -Xms12g >> > >> > -Xmx12g >> > >> > -Xss256k >> > >> > >> > >> > What more we can check here to find the root cause and prevent this >> > from >> > >> > happening again? >> > >> > Thanks in advance >> > >> > >> > >> >> > >> -- >> > >> Ere Maijala >> > >> Kansalliskirjasto / The National Library of Finland >> > >> >> > > >> > >> >