So you're saying that Solr can use additional memory on top of what Xmx limits 
? It appears the resident size keeps on increasing, swap was used at some point 
but it's not actively paging now.

In FreeBSD you can use procstat or vmstat for more info

root@solrcloud4:/usr/home/scott # procstat -r 57116
  PID COMM             RESOURCE                          VALUE
57116 java             user time                    07:59:00.567533
57116 java             system time                  00:31:36.921875
57116 java             maximum RSS                         31175620 KB
57116 java             integral shared memory             139301820 KB
57116 java             integral unshared data              46433940 KB
57116 java             integral unshared stack            495295360 KB
57116 java             page reclaims                       11361433
57116 java             page faults                          2919735
57116 java             swaps                                      0
57116 java             block reads                          2918179
57116 java             block writes                         1724654
57116 java             messages sent                       23464275
57116 java             messages received                   13276352
57116 java             signals received                       28619
57116 java             voluntary context switches          66989710
57116 java             involuntary context switches        49835294

So with a SOLR_HEAP value of 16g , I'm now at a total size of 30g which has 
already used some swap that it hasn't released yet. This will keep on going 
until swap usage is 100% and the box will crash.

I guess my questions are:
- Why does Solr use more than 16g ?
- Why isn't swapped memory released ?

Thanks!
Scott

-----Original Message-----
From: Shawn Heisey <apa...@elyograg.org> 
Sent: Monday, December 13, 2021 12:41 AM
To: users@solr.apache.org
Subject: Re: Solr Cloud Node re-join issue

On 12/12/2021 4:40 PM, Scott wrote:
> However, top still shows Solr using more than 16g . It started at 17g 
> and has been steadily growing, now it's at 23g and soon it will go 
> into swap
> 
> PID      USERNAME    THR PRI NICE   SIZE      RES  SWAP STATE    C   TIME    
> WCPU COMMAND
> 57116 solr                  165  52    0       235G    23G    0       uwait   
>  1  36:20   9.60% java

I have been doing some experiments with a FreeBSD VM running in vmware player.  
I have very little experience with that OS.

I have openjdk11 and apache-solr-8.10.0 installed using pkg.

It looks like top in FreeBSD has no equivalent to the SHR column seen on Linux. 
 Being able to see how much shared memory is being used is critical to seeing a 
complete picture of memory usage.  I suspect that if we could see it, the 
shared memory would be approximately 7GB when the RES column says 23GB.  This 
is something I have seen on Linux, and I have deduced that the actual memory 
used by the process will be the RES size minus the SHR size.  Sometimes the 
shared memory will get quite large.  I have no idea why this happens, but it 
does.

There is a Java tool called "jstat" that can give a very accurate picture of 
Java program memory usage.  But when the -XX:+PerfDisableSharedMem option is 
given to Java, that tool doesn't work.  That option is added with the default 
GC tuning options, because it eliminates a severe performance issue that is 
sometimes seen with Java software.

If you add the following to solr.in.sh then jstat will work:

GC_TUNE="-XX:+UseG1GC \
   -XX:+ParallelRefProcEnabled \
   -XX:MaxGCPauseMillis=250 \
   -XX:+UseLargePages \
   -XX:+AlwaysPreTouch \
   -XX:+ExplicitGCInvokesConcurrent"

After adding it, restart Solr and use the following command with the PID of the 
Solr process in place of PID, at a time when the RES column for Solr goes well 
beyond 16GB:

sudo jstat -gc -t PID 5000 20 > /tmp/jstat.out

That command will take a little less than two minutes to complete.  Then you 
can share the /tmp/jstat.out file using a file sharing website. 
Don't try to paste it into email ... the lines are VERY long.

If you add up the columns named S0C, S1C, EC, OC, MC, and CCSC for a given 
line, that will be pretty close to the process's total memory usage, in KB.

If you want to know what all those columns mean, here's Oracle's documentation 
for jstat:

https://docs.oracle.com/javase/8/docs/technotes/tools/unix/jstat.html

I've just gotten a look at the output from vmstat ... looks like that tool is 
useless for what I was trying to get from it.  It doesn't have any columns for 
swap.  You may have noticed that the si and so columns I mentioned before are 
not present.

It is worth noting that on the top output you pasted, that the Solr process is 
using zero swap.  On the top screen, are there processes with SWAP columns 
significantly larger than zero?  When you see a problem, what is the output of 
"swapinfo" ?

Thanks,
Shawn



This is a private message

Reply via email to