I'm not sure this is the issue, but maybe its http2 vs http1.

Could you retry with the following set on the cluster?

-Dsolr.http1=true



On Mon, Dec 5, 2022 at 5:08 AM Nick Vladiceanu <vladicean...@gmail.com>
wrote:

> Hello folks,
>
> We’re running our SolrCloud cluster in Kubernetes. Recently we’ve upgraded
> from 8.11 to 9.0 (and eventually to 9.1).
>
> Fully reindexed collections after upgrade, all looking good, no errors,
> response time improvements are noticed.
>
> We have the following specs:
> collection size:
> 22M docs, 1.3Kb doc size; ~28Gb total collection size at this point;
> shards: 6 shards, each ~4,7Gb; 1 core per node;
> nodes:
> 30Gi of RAM,
> 16 cores
> 96 nodes
> Heap: 23Gb heap
> JavaOpts: -Dsolr.modules=scripting,analysis-extras,ltr”
> gcTune: -XX:+UseG1GC -XX:G1HeapRegionSize=16m -XX:MaxGCPauseMillis=300
> -XX:InitiatingHeapOccupancyPercent=75 -XX:+UseLargePages
> -XX:+ParallelRefProcEnabled -XX:ParallelGCThreads=10 -XX:ConcGCThreads=2
> -XX:MinHeapFreeRatio=2 -XX:MaxHeapFreeRatio=10
>
>
> Problem
>
> The problem we face is when we try to reload the collection, in sync mode
> we’re getting timed out or forever running task if reload executed in async
> mode:
>
> curl “reload” output: https://justpaste.it/ap4d2 <
> https://justpaste.it/ap4d2>
> ErrorReportingConcurrentUpdateSolrClient stacktrace (appears in the logs
> of some nodes): https://justpaste.it/aq3dw <https://justpaste.it/aq3dw>
>
> There are no issues on a newly created cluster if there is no incoming
> traffic to it. Once we start sending requests to the cluster, collection
> reload becomes impossible. Other collections (smaller) within the same
> cluster are reloading just fine.
>
> In some cases, on some node the Old generation GC is kicking in and makes
> the entire cluster unstable, however, that doesn’t all the time when
> collection reload is timing out.
>
> We’ve tried the rollback to 8.11 and everything works normally as it used
> to be, no errors with reload, no other errors in the logs during reload,
> etc.
>
> We tried the following:
> run 9.0, 9.1 on Java 11 and Java 17: same result;
> lower cache warming, disable firstSearcher queries: same result;
> increase heap size, tune gc: same result;
> use apiv1 and apiv2 to issue reload commands: no difference;
> sync vs async reload: either forever running task or timing out after 180
> seconds;
>
> Did anyone face similar issues after upgrading to version 9 of Solr? Could
> you please advice where should we focus our attention while debugging this
> behavior? Any other advices/suggestions?
>
> Thank you
>
>
> Best regards,
> Nick Vladiceanu

Reply via email to