So you're saying that Solr can use additional memory on top of what Xmx limits ? It appears the resident size keeps on increasing, swap was used at some point but it's not actively paging now.
In FreeBSD you can use procstat or vmstat for more info root@solrcloud4:/usr/home/scott # procstat -r 57116 PID COMM RESOURCE VALUE 57116 java user time 07:59:00.567533 57116 java system time 00:31:36.921875 57116 java maximum RSS 31175620 KB 57116 java integral shared memory 139301820 KB 57116 java integral unshared data 46433940 KB 57116 java integral unshared stack 495295360 KB 57116 java page reclaims 11361433 57116 java page faults 2919735 57116 java swaps 0 57116 java block reads 2918179 57116 java block writes 1724654 57116 java messages sent 23464275 57116 java messages received 13276352 57116 java signals received 28619 57116 java voluntary context switches 66989710 57116 java involuntary context switches 49835294 So with a SOLR_HEAP value of 16g , I'm now at a total size of 30g which has already used some swap that it hasn't released yet. This will keep on going until swap usage is 100% and the box will crash. I guess my questions are: - Why does Solr use more than 16g ? - Why isn't swapped memory released ? Thanks! Scott -----Original Message----- From: Shawn Heisey <apa...@elyograg.org> Sent: Monday, December 13, 2021 12:41 AM To: users@solr.apache.org Subject: Re: Solr Cloud Node re-join issue On 12/12/2021 4:40 PM, Scott wrote: > However, top still shows Solr using more than 16g . It started at 17g > and has been steadily growing, now it's at 23g and soon it will go > into swap > > PID USERNAME THR PRI NICE SIZE RES SWAP STATE C TIME > WCPU COMMAND > 57116 solr 165 52 0 235G 23G 0 uwait > 1 36:20 9.60% java I have been doing some experiments with a FreeBSD VM running in vmware player. I have very little experience with that OS. I have openjdk11 and apache-solr-8.10.0 installed using pkg. It looks like top in FreeBSD has no equivalent to the SHR column seen on Linux. Being able to see how much shared memory is being used is critical to seeing a complete picture of memory usage. I suspect that if we could see it, the shared memory would be approximately 7GB when the RES column says 23GB. This is something I have seen on Linux, and I have deduced that the actual memory used by the process will be the RES size minus the SHR size. Sometimes the shared memory will get quite large. I have no idea why this happens, but it does. There is a Java tool called "jstat" that can give a very accurate picture of Java program memory usage. But when the -XX:+PerfDisableSharedMem option is given to Java, that tool doesn't work. That option is added with the default GC tuning options, because it eliminates a severe performance issue that is sometimes seen with Java software. If you add the following to solr.in.sh then jstat will work: GC_TUNE="-XX:+UseG1GC \ -XX:+ParallelRefProcEnabled \ -XX:MaxGCPauseMillis=250 \ -XX:+UseLargePages \ -XX:+AlwaysPreTouch \ -XX:+ExplicitGCInvokesConcurrent" After adding it, restart Solr and use the following command with the PID of the Solr process in place of PID, at a time when the RES column for Solr goes well beyond 16GB: sudo jstat -gc -t PID 5000 20 > /tmp/jstat.out That command will take a little less than two minutes to complete. Then you can share the /tmp/jstat.out file using a file sharing website. Don't try to paste it into email ... the lines are VERY long. If you add up the columns named S0C, S1C, EC, OC, MC, and CCSC for a given line, that will be pretty close to the process's total memory usage, in KB. If you want to know what all those columns mean, here's Oracle's documentation for jstat: https://docs.oracle.com/javase/8/docs/technotes/tools/unix/jstat.html I've just gotten a look at the output from vmstat ... looks like that tool is useless for what I was trying to get from it. It doesn't have any columns for swap. You may have noticed that the si and so columns I mentioned before are not present. It is worth noting that on the top output you pasted, that the Solr process is using zero swap. On the top screen, are there processes with SWAP columns significantly larger than zero? When you see a problem, what is the output of "swapinfo" ? Thanks, Shawn This is a private message