Hi Rekha,

Do you also have query load while indexing?  Have you tried the TLOG + PULL
replica types?
https://solr.apache.org/guide/8_4/shards-and-indexing-data-in-solrcloud.html#types-of-replicas

Thanks,
Wei

On Thu, Apr 22, 2021 at 11:27 PM Rekha Sekhar <rekhaa.sek...@gmail.com>
wrote:

> Hi,
>
> Gentle reminder...it's highly appreciated your advise on this.
>
> Thanks
> Rekha
>
> On Thu, 22 Apr, 2021, 1:13 PM Rekha Sekhar, <rekhaa.sek...@gmail.com>
> wrote:
>
> > Hi,
> >
> > We are experiencing heavy slowness on updates for SolrCloud
> implementation.
> > We are using it as 1 shard with 2 clusters. Also we have 3 zookeeper
> > nodes. The *solr* version is *8.7.0 *and *ZK* version is *3.6.2*.
> >
> > Everyday we have some heavy updates (100,000 to 500,000 updates
> processed parallel)
> > which includes delete, add and update.
> > We have a total 2.2 M records indexed.
> >
> > Every time when this updates/deletes happens we are seeing a lot of
> '*Reordered
> > DBQs detected*’ messages and finally the processing becomes very very
> > slow and an update/delete request is increasing from 100ms to 30 minutes
> to
> > complete.
> > At the same time we started to see error messages with "*Task queue
> > processing has stalled for 115231 ms with 100 remaining elements to
> process*
> > ”  and "*Idle timeout expired: 120000/120000 ms*” ,
> “*cancel_stream_error*
> > ” etc.
> > Sometimes one node goes to a recovery state and recovers after some time.
> >
> >
> > 2021-04-21 18:21:08.219 INFO  (qtp938463537-5550) [c:datacore s:shard1
> > r:core_node4 x:datacore_shard1_replica_n2] o.a.s.u.DirectUpdateHandler2
> *Reordered
> > DBQs detected*.
> >
> Update=add{_version_=1697674936592629760,id=S-5942167-P-108089342-F-800102562-E-180866483}
> > DBQs=[DBQ{version=1697674943496454144,q=store_element_id:395699},
> > DBQ{version=1697674943408373760,q=store_element_id:395698},
> > DBQ{version=1697674943311904768,q=store_element_id:395678},
> > DBQ{version=1697674943221727232,q=store_element_id:395649},
> > DBQ{version=1697674943143084032,q=store_element_id:395642},
> > DBQ{version=1697674943049760768,q=store_element_id:395612},
> > DBQ{version=1697674942964826112,q=store_element_id:395602},
> > DBQ{version=1697674942871502848,q=store_element_id:395587},
> > DBQ{version=1697674942790762496,q=store_element_id:395582},
> > DBQ{version=1697674942711070720,q=store_element_id:395578},
> > DBQ{version=1697674942622990336,q=store_element_id:199511},
> > DBQ{version=1697674942541201408,q=store_element_id:199508},
> > DBQ{version=1697674942452072448,q=store_element_id:397242},
> > DBQ{version=1697674942356652032,q=store_element_id:397194},
> > DBQ{version=1697674942268571648,q=store_element_id:397166},
> > DBQ{version=1697674942178394112,q=store_element_id:397164},
> > DBQ{version=1697674942014816256,q=store_element_id:397149},
> > DBQ{version=1697674941901570048,q=store_element_id:395758},
> > DBQ{version=1697674941790420992,q=store_element_id:395725},
> > DBQ{version=1697674941723312128,q=store_element_id:395630},
> >
> > 2021-04-21 18:30:23.636 INFO
> >  (recoveryExecutor-11-thread-5-processing-n:solr-1.solrcluster:8983_solr
> > x:datacore_shard1_replica_n2 c:datacore s:shard1 r:core_node4)
> [c:datacore
> > s:shard1 r:core_node4 x:datacore_shard1_replica_n2]
> o.a.s.c.*RecoveryStrategy
> > PeerSync Recovery was not successful - trying replication.*
> > 2021-04-21 18:30:23.636 INFO
> >  (recoveryExecutor-11-thread-5-processing-n:solr-1.solrcluster:8983_solr
> > x:datacore_shard1_replica_n2 c:datacore s:shard1 r:core_node4)
> [c:datacore
> > s:shard1 r:core_node4 x:datacore_shard1_replica_n2]
> o.a.s.c.*RecoveryStrategy
> > Starting Replication Recovery.*
> >
> >
> > Below given the  autoCommit and autoSoftCommit values used.
> >
> > <*autoCommit*>
> > <maxTime>${solr.autoCommit.maxTime:*90000*}</maxTime>
> > <openSearcher>*false*</openSearcher>
> > </autoCommit>
> > <*autoSoftCommit*>
> > <maxTime>${solr.autoSoftCommit.maxTime:*15000*}</maxTime>
> > </*autoSoftCommit*>
> >
> > Here is the GC logs for reference:
> >
> > [2021-04-21T18:28:32.296+0000][408073.843s] GC(541) User=0.07s Sys=0.01s
> > Real=0.04s
> > [2021-04-21T18:28:32.296+0000][408073.843s] GC(541) Pause Young (Normal)
> > (G1 Evacuation Pause) 7849M->1725M(10240M) 39.247ms
> > [2021-04-21T18:28:32.296+0000][408073.843s] GC(541) Metaspace:
> > 85486K->85486K(1126400K)
> > [2021-04-21T18:28:32.296+0000][408073.843s] GC(541) Humongous regions:
> > 15->15
> > [2021-04-21T18:28:32.296+0000][408073.843s] GC(541) Old regions: 412->412
> > [2021-04-21T18:28:32.296+0000][408073.843s] GC(541) Survivor regions:
> > 5->5(192)
> > [2021-04-21T18:28:32.296+0000][408073.843s] GC(541) Eden regions:
> > 1531->0(1531)
> > [2021-04-21T18:28:32.296+0000][408073.843s] GC(541)   Other: 0.5ms
> > [2021-04-21T18:28:32.296+0000][408073.843s] GC(541)   Post Evacuate
> > Collection Set: 6.1ms
> > [2021-04-21T18:28:32.296+0000][408073.843s] GC(541)   Evacuate Collection
> > Set: 32.5ms
> > [2021-04-21T18:28:32.296+0000][408073.843s] GC(541)   Pre Evacuate
> > Collection Set: 0.1ms
> > [2021-04-21T18:28:32.257+0000][408073.804s] GC(541) Using 2 workers of 2
> > for evacuation
> > [2021-04-21T18:28:32.257+0000][408073.804s] GC(541) Pause Young (Normal)
> > (G1 Evacuation Pause)
> >
> >
> > Would really appreciate any help on this.
> >
> > Thanks,
> > Rekha
> >
>

Reply via email to