Hi Zisis,

Thanks for your email.

We are suspecting the issue with one particular solr collection(or store).
Wherever the replicas of that store are present that nodes are going down.

Also now that shard is in recovery mode and Leader is not elected. Could
you please suggest something to bring up this store.

On Mon, Aug 30, 2021 at 1:55 PM Zisis Tachtsidis <zist...@runbox.com> wrote:

> My guess is that the Solr/Zookeeper communication issues are due to GC
> pauses. You are saying that you end up with OOM problems. High memory usage
> puts pressure on GC. Long GC pauses lead to timeouts in Solr/Zookeeper
> communication. We've seen that happening.
>
> First thing I'd do is to get a heap dump once the OOM is triggered and
> analyze that to see what is occupying the memory. Otherwise we are blind
> here.  Is it due to heavy indexing? Heavy querying? Both of them? Do you
> have customizations in the analysis chain that might generate more objects
> than usual?
>
> Zisis
>
> On Mon, 30 Aug 2021 03:44:13 -0400, Dave <hastings.recurs...@gmail.com>
> wrote:
>
> > I can’t help beyond such, I don’t like solr cloud nor zookeeper, I will
> always, if I can help it, stick to standalone solr instance.
> >
> > > On Aug 30, 2021, at 3:23 AM, HariBabu kuruva <
> hari2708.kur...@gmail.com> wrote:
> > >
> > > Hi Dave
> > >
> > > We tried setting the memory as per your suggestions.
> > >
> > > But still I see that the solr is going down in a couple of minutes
> with an
> > > OOM error. Also in the solr logs it says below connectivity issue
> between
> > > solr and zookeeper.  Please advise.
> > >
> > > Zookeeper is running fine.
> > >
> > >
> > > 2021-08-30 06:24:13.070 WARN  (main-SendThread(
> > > lxeisprdas06.corp.equinix.com:2181)) [   ] o.a.z.ClientCnxn Client
> session
> > > timed out, have not heard from server in 65584ms for session id
> > > 0x1000019354b021b
> > > 2021-08-30 06:24:13.071 WARN  (main-SendThread(
> > > lxeisprdas06.corp.equinix.com:2181)) [   ] o.a.z.ClientCnxn Session
> > > 0x1000019354b021b for sever
> lxeisprdas06.corp.equinix.com/10.**.*.*:2181,
> > > Closing socket connection. Attempting reconnect except it is a
> > > SessionExpiredException. =>
> > > org.apache.zookeeper.ClientCnxn$SessionTimeoutException: Client session
> > > timed out, have not heard from server in 65584ms for session id
> > > 0x1000019354b021b
> > >        at
> > > org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1243)
> > > org.apache.zookeeper.ClientCnxn$SessionTimeoutException: Client session
> > > timed out, have not heard from server in 65584ms for session id
> > > 0x1000019354b021b
> > >        at
> > > org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1243)
> > > ~[zookeeper-3.6.2.jar:3.6.2]
> > > 2021-08-30 06:24:26.182 ERROR (qtp1198197478-540) [   ]
> > > o.a.s.s.PKIAuthenticationPlugin Invalid key request timestamp:
> > > 1630304577209 , received timestamp: 1630304666181 , TTL: 15000
> > > 2021-08-30 06:24:26.182 ERROR (qtp1198197478-531) [   ]
> > > o.a.s.s.PKIAuthenticationPlugin Invalid key request timestamp:
> > > 1630304527726 , received timestamp: 1630304600766 , TTL: 15000
> > > 2021-08-30 06:26:36.014 WARN
> (zkConnectionManagerCallback-13-thread-1) [
> > > ] o.a.s.c.c.ConnectionManager Watcher
> > > org.apache.solr.common.cloud.ConnectionManager@e31302e name:
> > > ZooKeeperConnection Watcher:zookeeper2.corp.equinix.com:2181,
> > > zookeeper1.corp.equinix.com:2182,zookeeper3.corp.equinix.com:2183,
> > > zookeeper4.corp.equinix.com:2184,zookeeper5.corp.equinix.com:2185 got
> event
> > > WatchedEvent state:Disconnected type:None path:null path: null type:
> None
> > > 2021-08-30 06:26:36.014 WARN
> (zkConnectionManagerCallback-13-thread-1) [
> > > ] o.a.s.c.c.ConnectionManager zkClient has disconnected
> > > 2021-08-30 07:06:32.484 WARN  (main-SendThread(
> zookeeper5.corp.equ.com:2185))
> > > [   ] o.a.z.ClientCnxn Client session timed out, have not heard from
> server
> > > in 1851316ms for session id 0x1000019354b021b
> > >
> > >> On Sun, Aug 29, 2021 at 11:38 PM Dave <hastings.recurs...@gmail.com>
> wrote:
> > >>
> > >> Yes. Don’t set those memory restrictions,  just xms and xmx, both to
> 31
> > >> gigs. Java has problems past that line and will make the gc go into a
> bad
> > >> loop. I can send you a link as to why
> > >>
> https://community.datastax.com/questions/3661/why-is-a-32-gb-heap-allocation-not-recommended.html
> > >>
> > >> But this is almost like a protected secret
> > >>
> > >>>> On Aug 29, 2021, at 1:52 PM, Shawn Heisey <apa...@elyograg.org>
> wrote:
> > >>>
> > >>> On 8/29/2021 2:38 AM, HariBabu kuruva wrote:
> > >>>> Is it required to define both the parameters SOLR_HEAP and
> > >> SOLR_JAVA_MEM.
> > >>>> or can i comment SOLR_HEAP  and only define SOLR_JAVA_MEM.
> > >>>> Also  what highest value of Xmx value i can go if i receive OOM with
> > >> 31gb.
> > >>>> I have only solr running on that node.
> > >>>
> > >>> If both are defined, I do not know which one will actually take
> effect.
> > >> Figuring that out would require looking at the startup script and
> doing
> > >> some experiments to see what Java actually does.
> > >>>
> > >>> I would personally remove SOLR_JAVA_MEM and only go with SOLR_HEAP.
> Then
> > >> you can do something very simple like the following, and the Solr
> startup
> > >> script will set both -Xms and -Xmx java options to that value:
> > >>>
> > >>> SOLR_HEAP=4g
> > >>>
> > >>>> And could you please let me know the reason to disable swap memory.
> > >>>
> > >>> If a system starts actively swapping, its performance in general
> will be
> > >> extremely low.  If that happens, it is an indication that there is not
> > >> enough physical memory and the system needs more, or that
> configurations
> > >> need to be adjusted to require less memory.
> > >>>
> > >>> Disabling swap makes it impossible for the OS to try and use disk
> space
> > >> as memory.  In situations where programs are asking for too much
> memory and
> > >> you have swap completely disabled, either Java or the OS will simply
> kill
> > >> the process that's asking for too much memory, rather than letting it
> run
> > >> and destroy overall performance.
> > >>>
> > >>> ---
> > >>>
> > >>> Responding to something in the OP:
> > >>>
> > >>> It is completely normal to see 100 percent memory utilization on just
> > >> about any server, whether it's running Solr or not.  The OS will use
> all
> > >> available memory for caching purposes, to speed everything up.  The
> only
> > >> time you won't see 100 percent memory usage is when you have far more
> > >> memory than the system actually needs.  For instance, if you had
> 512GB of
> > >> memory on a system that only handles megabytes of data.
> > >>>
> > >>>
> https://cwiki.apache.org/confluence/display/solr/SolrPerformanceProblems
> > >>>
> > >>> (disclaimer: I wrote the wiki page linked here.  Any errors are
> mine.)
> > >>>
> > >>> Thanks,
> > >>> Shawn
> > >>
> > >
> > >
> > > --
> > >
> > > Thanks and Regards,
> > > Hari
> > > Mobile:9790756568
>
>
>

-- 

Thanks and Regards,
 Hari
Mobile:9790756568

Reply via email to