My guess is that the Solr/Zookeeper communication issues are due to GC pauses. 
You are saying that you end up with OOM problems. High memory usage puts 
pressure on GC. Long GC pauses lead to timeouts in Solr/Zookeeper 
communication. We've seen that happening. 

First thing I'd do is to get a heap dump once the OOM is triggered and analyze 
that to see what is occupying the memory. Otherwise we are blind here.  Is it 
due to heavy indexing? Heavy querying? Both of them? Do you have customizations 
in the analysis chain that might generate more objects than usual?

Zisis

On Mon, 30 Aug 2021 03:44:13 -0400, Dave <hastings.recurs...@gmail.com> wrote:

> I can’t help beyond such, I don’t like solr cloud nor zookeeper, I will 
> always, if I can help it, stick to standalone solr instance. 
> 
> > On Aug 30, 2021, at 3:23 AM, HariBabu kuruva <hari2708.kur...@gmail.com> 
> > wrote:
> > 
> > Hi Dave
> > 
> > We tried setting the memory as per your suggestions.
> > 
> > But still I see that the solr is going down in a couple of minutes with an
> > OOM error. Also in the solr logs it says below connectivity issue between
> > solr and zookeeper.  Please advise.
> > 
> > Zookeeper is running fine.
> > 
> > 
> > 2021-08-30 06:24:13.070 WARN  (main-SendThread(
> > lxeisprdas06.corp.equinix.com:2181)) [   ] o.a.z.ClientCnxn Client session
> > timed out, have not heard from server in 65584ms for session id
> > 0x1000019354b021b
> > 2021-08-30 06:24:13.071 WARN  (main-SendThread(
> > lxeisprdas06.corp.equinix.com:2181)) [   ] o.a.z.ClientCnxn Session
> > 0x1000019354b021b for sever lxeisprdas06.corp.equinix.com/10.**.*.*:2181,
> > Closing socket connection. Attempting reconnect except it is a
> > SessionExpiredException. =>
> > org.apache.zookeeper.ClientCnxn$SessionTimeoutException: Client session
> > timed out, have not heard from server in 65584ms for session id
> > 0x1000019354b021b
> >        at
> > org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1243)
> > org.apache.zookeeper.ClientCnxn$SessionTimeoutException: Client session
> > timed out, have not heard from server in 65584ms for session id
> > 0x1000019354b021b
> >        at
> > org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1243)
> > ~[zookeeper-3.6.2.jar:3.6.2]
> > 2021-08-30 06:24:26.182 ERROR (qtp1198197478-540) [   ]
> > o.a.s.s.PKIAuthenticationPlugin Invalid key request timestamp:
> > 1630304577209 , received timestamp: 1630304666181 , TTL: 15000
> > 2021-08-30 06:24:26.182 ERROR (qtp1198197478-531) [   ]
> > o.a.s.s.PKIAuthenticationPlugin Invalid key request timestamp:
> > 1630304527726 , received timestamp: 1630304600766 , TTL: 15000
> > 2021-08-30 06:26:36.014 WARN  (zkConnectionManagerCallback-13-thread-1) [
> > ] o.a.s.c.c.ConnectionManager Watcher
> > org.apache.solr.common.cloud.ConnectionManager@e31302e name:
> > ZooKeeperConnection Watcher:zookeeper2.corp.equinix.com:2181,
> > zookeeper1.corp.equinix.com:2182,zookeeper3.corp.equinix.com:2183,
> > zookeeper4.corp.equinix.com:2184,zookeeper5.corp.equinix.com:2185 got event
> > WatchedEvent state:Disconnected type:None path:null path: null type: None
> > 2021-08-30 06:26:36.014 WARN  (zkConnectionManagerCallback-13-thread-1) [
> > ] o.a.s.c.c.ConnectionManager zkClient has disconnected
> > 2021-08-30 07:06:32.484 WARN  
> > (main-SendThread(zookeeper5.corp.equ.com:2185))
> > [   ] o.a.z.ClientCnxn Client session timed out, have not heard from server
> > in 1851316ms for session id 0x1000019354b021b
> > 
> >> On Sun, Aug 29, 2021 at 11:38 PM Dave <hastings.recurs...@gmail.com> wrote:
> >> 
> >> Yes. Don’t set those memory restrictions,  just xms and xmx, both to 31
> >> gigs. Java has problems past that line and will make the gc go into a bad
> >> loop. I can send you a link as to why
> >> https://community.datastax.com/questions/3661/why-is-a-32-gb-heap-allocation-not-recommended.html
> >> 
> >> But this is almost like a protected secret
> >> 
> >>>> On Aug 29, 2021, at 1:52 PM, Shawn Heisey <apa...@elyograg.org> wrote:
> >>> 
> >>> On 8/29/2021 2:38 AM, HariBabu kuruva wrote:
> >>>> Is it required to define both the parameters SOLR_HEAP and
> >> SOLR_JAVA_MEM.
> >>>> or can i comment SOLR_HEAP  and only define SOLR_JAVA_MEM.
> >>>> Also  what highest value of Xmx value i can go if i receive OOM with
> >> 31gb.
> >>>> I have only solr running on that node.
> >>> 
> >>> If both are defined, I do not know which one will actually take effect.
> >> Figuring that out would require looking at the startup script and doing
> >> some experiments to see what Java actually does.
> >>> 
> >>> I would personally remove SOLR_JAVA_MEM and only go with SOLR_HEAP. Then
> >> you can do something very simple like the following, and the Solr startup
> >> script will set both -Xms and -Xmx java options to that value:
> >>> 
> >>> SOLR_HEAP=4g
> >>> 
> >>>> And could you please let me know the reason to disable swap memory.
> >>> 
> >>> If a system starts actively swapping, its performance in general will be
> >> extremely low.  If that happens, it is an indication that there is not
> >> enough physical memory and the system needs more, or that configurations
> >> need to be adjusted to require less memory.
> >>> 
> >>> Disabling swap makes it impossible for the OS to try and use disk space
> >> as memory.  In situations where programs are asking for too much memory and
> >> you have swap completely disabled, either Java or the OS will simply kill
> >> the process that's asking for too much memory, rather than letting it run
> >> and destroy overall performance.
> >>> 
> >>> ---
> >>> 
> >>> Responding to something in the OP:
> >>> 
> >>> It is completely normal to see 100 percent memory utilization on just
> >> about any server, whether it's running Solr or not.  The OS will use all
> >> available memory for caching purposes, to speed everything up.  The only
> >> time you won't see 100 percent memory usage is when you have far more
> >> memory than the system actually needs.  For instance, if you had 512GB of
> >> memory on a system that only handles megabytes of data.
> >>> 
> >>> https://cwiki.apache.org/confluence/display/solr/SolrPerformanceProblems
> >>> 
> >>> (disclaimer: I wrote the wiki page linked here.  Any errors are mine.)
> >>> 
> >>> Thanks,
> >>> Shawn
> >> 
> > 
> > 
> > -- 
> > 
> > Thanks and Regards,
> > Hari
> > Mobile:9790756568


Reply via email to