Hi,

One thing I have noticed is that if I keep these servers ideal (move
request to another infra) then the searcher gets closed after a few
minutes. so somehow incoming traffic is responsible for the searcher not
getting closed.

This particular request took almost 6 hours and only got closed when I
diverted the traffic to another infra .


https://drive.google.com/file/d/197QFkNNsbkhOL57lVn0EkPe6FEzKkWFL/view?usp=share_link

On Thu, Dec 22, 2022 at 3:19 PM Satya Nand <satya.n...@indiamart.com> wrote:

> Hi Dominique,
>
> I looked at the stack trace but I couldn't know for sure why the thread is
> waiting. can anyone help me in decoding this?
>
> httpShardExecutor-7-thread-939362-processing-x:im-search-03-08-22_shard1_replica_p17
> r:core_node18 http:////
> 10.128.193.11:8985//solr//im-search-03-08-22_shard1_replica_p17//|http:////10.128.99.14:8985//solr//im-search-03-08-22_shard1_replica_n1//
> <http://10.128.193.11:8985//solr//im-search-03-08-22_shard1_replica_p17//%7Chttp:////10.128.99.14:8985//solr//im-search-03-08-22_shard1_replica_n1//>
> n:10.128.193.11:8985_solr c:im-search-03-08-22 s:shard1 [http:////
> 10.128.193.11:8985//solr//im-search-03-08-22_shard1_replica_p17//, http://
> //10.128.99.14:8985//solr//im-search-03-08-22_shard1_replica_n1//]
>
> PRIORITY : 5
>
> THREAD ID : 0X00007FE6180494C0
>
> NATIVE ID : 0X54E3
>
> NATIVE ID (DECIMAL) : 21731
>
> STATE : WAITING
>
> stackTrace:
> java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(java.base@15.0.2/Native Method)
> - waiting on <no object reference available>
> at java.lang.Object.wait(java.base@15.0.2/Object.java:321)
> at org.eclipse.jetty.client.util.InputStreamResponseListener$Input.read(
> InputStreamResponseListener.java:318)
> - locked <0x000000054cd27c88> (a
> org.eclipse.jetty.client.util.InputStreamResponseListener)
> at org.apache.solr.common.util.FastInputStream.readWrappedStream(
> FastInputStream.java:90)
> at org.apache.solr.common.util.FastInputStream.refill(
> FastInputStream.java:99)
> at org.apache.solr.common.util.FastInputStream.readByte(
> FastInputStream.java:217)
> at org.apache.solr.common.util.JavaBinCodec._init(JavaBinCodec.java:211)
> at org.apache.solr.common.util.JavaBinCodec.initRead(JavaBinCodec.java:202
> )
> at org.apache.solr.common.util.JavaBinCodec.unmarshal(
> JavaBinCodec.java:195)
> at org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(
> BinaryResponseParser.java:51)
> at
> org.apache.solr.client.solrj.impl.Http2SolrClient.processErrorsAndResponse(
> Http2SolrClient.java:711)
> at org.apache.solr.client.solrj.impl.Http2SolrClient.request(
> Http2SolrClient.java:421)
> at org.apache.solr.client.solrj.impl.Http2SolrClient.request(
> Http2SolrClient.java:776)
> at org.apache.solr.client.solrj.impl.LBSolrClient.doRequest(
> LBSolrClient.java:369)
> at org.apache.solr.client.solrj.impl.LBSolrClient.request(
> LBSolrClient.java:297)
> at
> org.apache.solr.handler.component.HttpShardHandlerFactory.makeLoadBalancedRequest(
> HttpShardHandlerFactory.java:371)
> at org.apache.solr.handler.component.ShardRequestor.call(
> ShardRequestor.java:132)
> at org.apache.solr.handler.component.ShardRequestor.call(
> ShardRequestor.java:41)
> at java.util.concurrent.FutureTask.run(java.base@
> 15.0.2/FutureTask.java:264)
> at java.util.concurrent.Executors$RunnableAdapter.call(java.base@
> 15.0.2/Executors.java:515)
> at java.util.concurrent.FutureTask.run(java.base@
> 15.0.2/FutureTask.java:264)
> at
> com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(
> InstrumentedExecutorService.java:180)
> at
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(
> ExecutorUtil.java:218)
> at
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$$Lambda$269/0x00000008010566b0.run(Unknown
> Source)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@
> 15.0.2/ThreadPoolExecutor.java:1130)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@
> 15.0.2/ThreadPoolExecutor.java:630)
> at java.lang.Thread.run(java.base@15.0.2/Thread.java:832)
>
> On Sun, Dec 18, 2022 at 3:46 PM Dominique Bejean <
> dominique.bej...@eolya.fr> wrote:
>
>> Hi,
>>
>> May be a thread dump and a heap dump can help to find where  and why this
>> request is blocked ?
>> May be just by finding this thread in the Solr console, you can see where
>> the thread is blocked ?
>>
>> Regards
>>
>> Dominique
>>
>>
>> Le dim. 18 déc. 2022 à 09:10, Satya Nand <satya.n...@indiamart.com
>> .invalid>
>> a écrit :
>>
>> > Pinging on this thread again to bring it to the top.
>> >
>> > Any idea why one request is stuck for hours in solr cloud.?
>> >
>> > On Fri, Dec 9, 2022 at 3:35 PM Satya Nand <satya.n...@indiamart.com>
>> > wrote:
>> >
>> > > Hi Ere,
>> > >
>> > > We tried executing this request again and it didn't take any time. So
>> it
>> > > is not repeatable. average response time of all the queries around
>> this
>> > > period was only approx 100-200 ms.
>> > >
>> > > This was a group=true request where we get 14 groups and 5 results per
>> > > group. So no deep pagination.
>> > >
>> > > On Fri, Dec 9, 2022 at 2:04 PM Ere Maijala <ere.maij...@helsinki.fi>
>> > > wrote:
>> > >
>> > >> Hi,
>> > >>
>> > >> Are the same requests sometimes stalling and sometimes fast, or is it
>> > >> some particular queries that take hours?
>> > >>
>> > >> There are some things you should avoid with SolrCloud, and deep
>> paging
>> > >> (i.e. a large number for the start or rows parameter) is a typical
>> issue
>> > >> (see e.g. https://yonik.com/solr/paging-and-deep-paging/ for more
>> > >> information).
>> > >>
>> > >> Best,
>> > >> Ere
>> > >>
>> > >> Satya Nand kirjoitti 8.12.2022 klo 13.27:
>> > >> > Hi,
>> > >> >
>> > >> > Greetings for the day,
>> > >> >
>> > >> > We are facing a strange problem in Solr cloud where a few requests
>> are
>> > >> > taking hours to complete. Some requests return with a 0 status code
>> > and
>> > >> > some with a 500 status code. The recent request took more than 5
>> hours
>> > >> to
>> > >> > complete with only a 9k results count.
>> > >> >
>> > >> >
>> > >> > These queries create problems in closing old searchers,  Some times
>> > >> there
>> > >> > are 3-4 searchers where one is a new searcher and the others are
>> just
>> > >> stuck
>> > >> > because a few queries are tracking hours. Finally, the application
>> > slows
>> > >> > down horribly, and the load increases.
>> > >> >
>> > >> > I have downloaded the stack trace of the affected node and tried to
>> > >> analyze
>> > >> > this stack trace online. but I couldn't get many insights from it.
>> > >> > .
>> > >> >
>> > >> > Stack Trace:
>> > >> >
>> > >> >
>> > >>
>> >
>> https://fastthread.io/my-thread-report.jsp?p=c2hhcmVkLzIwMjIvMTIvOC9sb2dzLnR4dC0tMTAtNTUtMzA=&;
>> > >> >
>> > >> > JVM Settings: We are using Parallel GC, can this be causing this
>> much
>> > >> log
>> > >> > pause?
>> > >> >
>> > >> > -XX:+UseParallelGC
>> > >> > -XX:-OmitStackTraceInFastThrow
>> > >> > -Xms12g
>> > >> > -Xmx12g
>> > >> > -Xss256k
>> > >> >
>> > >> > What more we can check here to find the root cause and prevent this
>> > from
>> > >> > happening again?
>> > >> > Thanks in advance
>> > >> >
>> > >>
>> > >> --
>> > >> Ere Maijala
>> > >> Kansalliskirjasto / The National Library of Finland
>> > >>
>> > >
>> >
>>
>

Reply via email to