Hi Shawn,

Thanks for the response. Based on the questions you have provided I have
collected the following data.

a. This is the stack trace :

2021-12-16 18:05:11.834 ERROR (qtp1765690649-66565) [c:ss s:shard1
r:core_node24 x:ss_shard1_replica_n23] o.a.s.u.SolrCmdDistributor
java.io.IOException: Request processing has stalled for 20032ms with 100
remaining elements in the queue.
        at
org.apache.solr.client.solrj.impl.ConcurrentUpdateHttp2SolrClient.request(ConcurrentUpdateHttp2SolrClient.java:448)
        at
org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1290)
        at
org.apache.solr.update.SolrCmdDistributor.doRequest(SolrCmdDistributor.java:344)
        at
org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:337)
        at
org.apache.solr.update.SolrCmdDistributor.distribAdd(SolrCmdDistributor.java:242)
        at
org.apache.solr.update.processor.DistributedZkUpdateProcessor.doDistribAdd(DistributedZkUpdateProcessor.java:300)
        at
org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:233)
        at
org.apache.solr.update.processor.DistributedZkUpdateProcessor.processAdd(DistributedZkUpdateProcessor.java:245)
        at
org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:103)
        at
org.apache.solr.handler.loader.JavabinLoader$1.update(JavabinLoader.java:110)
        at
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$StreamingCodec.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:332)
        at
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$StreamingCodec.readIterator(JavaBinUpdateRequestCodec.java:281)
        at
org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:338)
        at
org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:283)
        at
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$StreamingCodec.readNamedList(JavaBinUpdateRequestCodec.java:236)
        at
org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:303)
        at
org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:283)
        at
org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:196)
        at
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:127)
        at
org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:122)
        at
org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:70)
        at
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:97)
        at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
        at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:211)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:2596)
        at
org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:802)
        at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:579)
        at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:420)
        at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:352)
        at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1596)
        at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:545)
        at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
        at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:590)
        at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
        at
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)
        at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1607)
        at
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
        at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1297)
        at
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
        at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:485)
        at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1577)
        at
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
        at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1212)
        at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
        at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:221)
        at
org.eclipse.jetty.server.handler.InetAccessHandler.handle(InetAccessHandler.java:177)
        at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:146)
        at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
        at
org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:322)
        at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
        at org.eclipse.jetty.server.Server.handle(Server.java:500)
        at
org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:383)
        at
org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:547)
        at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:375)
        at org.eclipse.jetty.server.HttpChannel.run(HttpChannel.java:335)
        at
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:336)
        at
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:313)
        at
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171)
        at
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.produce(EatWhatYouKill.java:135)
        at
org.eclipse.jetty.http2.HTTP2Connection.produce(HTTP2Connection.java:170)
        at
org.eclipse.jetty.http2.HTTP2Connection.onFillable(HTTP2Connection.java:125)
        at
org.eclipse.jetty.http2.HTTP2Connection$FillableCallback.succeeded(HTTP2Connection.java:348)
        at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)
        at
org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)
        at
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:336)
        at
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:313)
        at
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171)
        at
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:129)
        at
org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:388)
        at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:806)
        at
org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:938)
        at java.base/java.lang.Thread.run(Thread.java:834)

b. Max heap on the server is 32GB
c. We are using the default GC tuning - not customized anything around it.
d. I see the following GC logs around that time corresponding to the error
above.
[2021-12-16T18:02:39.902+0000][323265.234s] GC(1711) Pause Young (Normal)
(G1 Evacuation Pause)
[2021-12-16T18:02:39.902+0000][323265.234s] GC(1711) Using 8 workers of 8
for evacuation
[2021-12-16T18:02:39.979+0000][323265.311s] GC(1711)   Pre Evacuate
Collection Set: 0.2ms
[2021-12-16T18:02:39.979+0000][323265.311s] GC(1711)   Evacuate Collection
Set: 46.1ms
[2021-12-16T18:02:39.979+0000][323265.311s] GC(1711)   Post Evacuate
Collection Set: 28.8ms
[2021-12-16T18:02:39.979+0000][323265.311s] GC(1711)   Other: 2.2ms
[2021-12-16T18:02:39.979+0000][323265.311s] GC(1711) Eden regions:
1189->0(1187)
[2021-12-16T18:02:39.979+0000][323265.311s] GC(1711) Survivor regions:
39->41(154)
[2021-12-16T18:02:39.979+0000][323265.311s] GC(1711) Old regions: 219->219
[2021-12-16T18:02:39.979+0000][323265.311s] GC(1711) Humongous regions:
185->185
[2021-12-16T18:02:39.979+0000][323265.311s] GC(1711) Metaspace:
86065K->86065K(92160K)
[2021-12-16T18:02:39.980+0000][323265.311s] GC(1711) Pause Young (Normal)
(G1 Evacuation Pause) 26092M->7101M(32768M) 77.166ms
[2021-12-16T18:02:39.980+0000][323265.312s] GC(1711) User=0.57s Sys=0.00s
Real=0.07s
[2021-12-16T18:06:21.343+0000][323486.675s] GC(1712) Pause Young (Normal)
(G1 Evacuation Pause)
[2021-12-16T18:06:21.343+0000][323486.675s] GC(1712) Using 8 workers of 8
for evacuation
[2021-12-16T18:06:21.426+0000][323486.758s] GC(1712)   Pre Evacuate
Collection Set: 0.2ms
[2021-12-16T18:06:21.426+0000][323486.758s] GC(1712)   Evacuate Collection
Set: 63.6ms
[2021-12-16T18:06:21.426+0000][323486.758s] GC(1712)   Post Evacuate
Collection Set: 17.4ms
[2021-12-16T18:06:21.426+0000][323486.758s] GC(1712)   Other: 2.3ms
[2021-12-16T18:06:21.426+0000][323486.758s] GC(1712) Eden regions:
1187->0(1198)
[2021-12-16T18:06:21.426+0000][323486.758s] GC(1712) Survivor regions:
41->30(154)
[2021-12-16T18:06:21.426+0000][323486.758s] GC(1712) Old regions: 219->219
[2021-12-16T18:06:21.426+0000][323486.758s] GC(1712) Humongous regions:
197->197
[2021-12-16T18:06:21.426+0000][323486.758s] GC(1712) Metaspace:
86065K->86065K(92160K)
[2021-12-16T18:06:21.426+0000][323486.758s] GC(1712) Pause Young (Normal)
(G1 Evacuation Pause) 26285M->7122M(32768M) 83.507ms
[2021-12-16T18:06:21.426+0000][323486.758s] GC(1712) User=0.62s Sys=0.00s
Real=0.08s
[2021-12-16T18:07:45.174+0000][323570.506s] GC(1713) Pause Young (Normal)
(G1 Evacuation Pause)
[2021-12-16T18:07:45.174+0000][323570.506s] GC(1713) Using 8 workers of 8
for evacuation
[2021-12-16T18:07:45.233+0000][323570.565s] GC(1713)   Pre Evacuate
Collection Set: 0.2ms
[2021-12-16T18:07:45.233+0000][323570.565s] GC(1713)   Evacuate Collection
Set: 44.9ms
[2021-12-16T18:07:45.233+0000][323570.565s] GC(1713)   Post Evacuate
Collection Set: 11.8ms
[2021-12-16T18:07:45.233+0000][323570.565s] GC(1713)   Other: 2.3ms
[2021-12-16T18:07:45.233+0000][323570.565s] GC(1713) Eden regions:
1198->0(1205)
[2021-12-16T18:07:45.233+0000][323570.565s] GC(1713) Survivor regions:
30->23(154)
[2021-12-16T18:07:45.233+0000][323570.565s] GC(1713) Old regions: 219->219
[2021-12-16T18:07:45.233+0000][323570.565s] GC(1713) Humongous regions:
197->197
[2021-12-16T18:07:45.233+0000][323570.565s] GC(1713) Metaspace:
86065K->86065K(92160K)
[2021-12-16T18:07:45.233+0000][323570.565s] GC(1713) Pause Young (Normal)
(G1 Evacuation Pause) 26290M->7020M(32768M) 59.056ms
[2021-12-16T18:07:45.234+0000][323570.565s] GC(1713) User=0.43s Sys=0.00s
Real=0.06s

Regards,
Nikhilesh Jannu


On Thu, Dec 16, 2021 at 5:34 PM Shawn Heisey <apa...@elyograg.org> wrote:

> On 12/16/2021 1:22 PM, Nikhilesh Jannu wrote:
> > One of our SOLR Cloud cluster has started showing lot of errors related
> to
> >
> > java.io.IOException: Request processing has stalled for 20029ms with 100
> > remaining elements in the queue.
>
> The error message will have significantly more detail than that.  We
> would need to see ALL the lines in the error message to have any hope of
> figuring out where the problem is happening.
>
> What is the max heap size for Solr?  Have you changed the GC tuning?
> Can you provide the solr GC logs from that server?  Note that you cannot
> attach those logs to email because the mailing list will eat them ...
> you will need to put them on a file-sharing website and provide a URL to
> download them.
>
> Thanks,
> Shawn
>

Reply via email to