In logs I could see this WARN.

2021-08-30 13:15:52.301 WARN  (zkCallback-12-thread-3) [c:quoteStore
s:shard1 r:core_node6 x:quoteStore_shard1_replica_n5]
o.a.s.c.RecoveryStrategy Stopping recovery for
core=[quoteStore_shard1_replica_n5] coreNodeName=[core_node6]

On Mon, Aug 30, 2021 at 6:43 PM HariBabu kuruva <hari2708.kur...@gmail.com>
wrote:

> Hi Zisis,
>
> Thanks for your email.
>
> We are suspecting the issue with one particular solr collection(or
> store).  Wherever the replicas of that store are present that nodes are
> going down.
>
> Also now that shard is in recovery mode and Leader is not elected. Could
> you please suggest something to bring up this store.
>
> On Mon, Aug 30, 2021 at 1:55 PM Zisis Tachtsidis <zist...@runbox.com>
> wrote:
>
>> My guess is that the Solr/Zookeeper communication issues are due to GC
>> pauses. You are saying that you end up with OOM problems. High memory usage
>> puts pressure on GC. Long GC pauses lead to timeouts in Solr/Zookeeper
>> communication. We've seen that happening.
>>
>> First thing I'd do is to get a heap dump once the OOM is triggered and
>> analyze that to see what is occupying the memory. Otherwise we are blind
>> here.  Is it due to heavy indexing? Heavy querying? Both of them? Do you
>> have customizations in the analysis chain that might generate more objects
>> than usual?
>>
>> Zisis
>>
>> On Mon, 30 Aug 2021 03:44:13 -0400, Dave <hastings.recurs...@gmail.com>
>> wrote:
>>
>> > I can’t help beyond such, I don’t like solr cloud nor zookeeper, I will
>> always, if I can help it, stick to standalone solr instance.
>> >
>> > > On Aug 30, 2021, at 3:23 AM, HariBabu kuruva <
>> hari2708.kur...@gmail.com> wrote:
>> > >
>> > > Hi Dave
>> > >
>> > > We tried setting the memory as per your suggestions.
>> > >
>> > > But still I see that the solr is going down in a couple of minutes
>> with an
>> > > OOM error. Also in the solr logs it says below connectivity issue
>> between
>> > > solr and zookeeper.  Please advise.
>> > >
>> > > Zookeeper is running fine.
>> > >
>> > >
>> > > 2021-08-30 06:24:13.070 WARN  (main-SendThread(
>> > > lxeisprdas06.corp.equinix.com:2181)) [   ] o.a.z.ClientCnxn Client
>> session
>> > > timed out, have not heard from server in 65584ms for session id
>> > > 0x1000019354b021b
>> > > 2021-08-30 06:24:13.071 WARN  (main-SendThread(
>> > > lxeisprdas06.corp.equinix.com:2181)) [   ] o.a.z.ClientCnxn Session
>> > > 0x1000019354b021b for sever
>> lxeisprdas06.corp.equinix.com/10.**.*.*:2181,
>> > > Closing socket connection. Attempting reconnect except it is a
>> > > SessionExpiredException. =>
>> > > org.apache.zookeeper.ClientCnxn$SessionTimeoutException: Client
>> session
>> > > timed out, have not heard from server in 65584ms for session id
>> > > 0x1000019354b021b
>> > >        at
>> > > org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1243)
>> > > org.apache.zookeeper.ClientCnxn$SessionTimeoutException: Client
>> session
>> > > timed out, have not heard from server in 65584ms for session id
>> > > 0x1000019354b021b
>> > >        at
>> > > org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1243)
>> > > ~[zookeeper-3.6.2.jar:3.6.2]
>> > > 2021-08-30 06:24:26.182 ERROR (qtp1198197478-540) [   ]
>> > > o.a.s.s.PKIAuthenticationPlugin Invalid key request timestamp:
>> > > 1630304577209 , received timestamp: 1630304666181 , TTL: 15000
>> > > 2021-08-30 06:24:26.182 ERROR (qtp1198197478-531) [   ]
>> > > o.a.s.s.PKIAuthenticationPlugin Invalid key request timestamp:
>> > > 1630304527726 , received timestamp: 1630304600766 , TTL: 15000
>> > > 2021-08-30 06:26:36.014 WARN
>> (zkConnectionManagerCallback-13-thread-1) [
>> > > ] o.a.s.c.c.ConnectionManager Watcher
>> > > org.apache.solr.common.cloud.ConnectionManager@e31302e name:
>> > > ZooKeeperConnection Watcher:zookeeper2.corp.equinix.com:2181,
>> > > zookeeper1.corp.equinix.com:2182,zookeeper3.corp.equinix.com:2183,
>> > > zookeeper4.corp.equinix.com:2184,zookeeper5.corp.equinix.com:2185
>> got event
>> > > WatchedEvent state:Disconnected type:None path:null path: null type:
>> None
>> > > 2021-08-30 06:26:36.014 WARN
>> (zkConnectionManagerCallback-13-thread-1) [
>> > > ] o.a.s.c.c.ConnectionManager zkClient has disconnected
>> > > 2021-08-30 07:06:32.484 WARN  (main-SendThread(
>> zookeeper5.corp.equ.com:2185))
>> > > [   ] o.a.z.ClientCnxn Client session timed out, have not heard from
>> server
>> > > in 1851316ms for session id 0x1000019354b021b
>> > >
>> > >> On Sun, Aug 29, 2021 at 11:38 PM Dave <hastings.recurs...@gmail.com>
>> wrote:
>> > >>
>> > >> Yes. Don’t set those memory restrictions,  just xms and xmx, both to
>> 31
>> > >> gigs. Java has problems past that line and will make the gc go into
>> a bad
>> > >> loop. I can send you a link as to why
>> > >>
>> https://community.datastax.com/questions/3661/why-is-a-32-gb-heap-allocation-not-recommended.html
>> > >>
>> > >> But this is almost like a protected secret
>> > >>
>> > >>>> On Aug 29, 2021, at 1:52 PM, Shawn Heisey <apa...@elyograg.org>
>> wrote:
>> > >>>
>> > >>> On 8/29/2021 2:38 AM, HariBabu kuruva wrote:
>> > >>>> Is it required to define both the parameters SOLR_HEAP and
>> > >> SOLR_JAVA_MEM.
>> > >>>> or can i comment SOLR_HEAP  and only define SOLR_JAVA_MEM.
>> > >>>> Also  what highest value of Xmx value i can go if i receive OOM
>> with
>> > >> 31gb.
>> > >>>> I have only solr running on that node.
>> > >>>
>> > >>> If both are defined, I do not know which one will actually take
>> effect.
>> > >> Figuring that out would require looking at the startup script and
>> doing
>> > >> some experiments to see what Java actually does.
>> > >>>
>> > >>> I would personally remove SOLR_JAVA_MEM and only go with SOLR_HEAP.
>> Then
>> > >> you can do something very simple like the following, and the Solr
>> startup
>> > >> script will set both -Xms and -Xmx java options to that value:
>> > >>>
>> > >>> SOLR_HEAP=4g
>> > >>>
>> > >>>> And could you please let me know the reason to disable swap memory.
>> > >>>
>> > >>> If a system starts actively swapping, its performance in general
>> will be
>> > >> extremely low.  If that happens, it is an indication that there is
>> not
>> > >> enough physical memory and the system needs more, or that
>> configurations
>> > >> need to be adjusted to require less memory.
>> > >>>
>> > >>> Disabling swap makes it impossible for the OS to try and use disk
>> space
>> > >> as memory.  In situations where programs are asking for too much
>> memory and
>> > >> you have swap completely disabled, either Java or the OS will simply
>> kill
>> > >> the process that's asking for too much memory, rather than letting
>> it run
>> > >> and destroy overall performance.
>> > >>>
>> > >>> ---
>> > >>>
>> > >>> Responding to something in the OP:
>> > >>>
>> > >>> It is completely normal to see 100 percent memory utilization on
>> just
>> > >> about any server, whether it's running Solr or not.  The OS will use
>> all
>> > >> available memory for caching purposes, to speed everything up.  The
>> only
>> > >> time you won't see 100 percent memory usage is when you have far more
>> > >> memory than the system actually needs.  For instance, if you had
>> 512GB of
>> > >> memory on a system that only handles megabytes of data.
>> > >>>
>> > >>>
>> https://cwiki.apache.org/confluence/display/solr/SolrPerformanceProblems
>> > >>>
>> > >>> (disclaimer: I wrote the wiki page linked here.  Any errors are
>> mine.)
>> > >>>
>> > >>> Thanks,
>> > >>> Shawn
>> > >>
>> > >
>> > >
>> > > --
>> > >
>> > > Thanks and Regards,
>> > > Hari
>> > > Mobile:9790756568
>>
>>
>>
>
> --
>
> Thanks and Regards,
>  Hari
> Mobile:9790756568
>


-- 

Thanks and Regards,
 Hari
Mobile:9790756568

Reply via email to