Eventually we managed to figure out what happened. we deploy our cluster on RedHat OpenShift Container Platform 4.7, and apparently our specific minor had a known issue with creating containers that can run more than 1024 threads. We solved the issue by asking our providers to implement the relevant hotfix.
For further reading, reference RedHat's entry on the subject: the first report we found: -https://bugzilla.redhat.com/show_bug.cgi?id=1844447 the bug entry: -https://access.redhat.com/solutions/5366631 the relevant hotfix release: -https://access.redhat.com/errata/RHBA-2021:4572 Thanks for everyone who replied to our initial mail and corresponded with us, you really helped us realize what went on and stirred us in the right direction. We wouldn't have been able to fix the issue without you and we hope that by sharing our experience here no one will have to deal with it again. extremely thankful, 123456780sss Sent with Proton Mail secure email. ------- Original Message ------- On Wednesday, February 2nd, 2022 at 10:40 AM, 123456780sss <123456780...@protonmail.com.INVALID> wrote: > Our system resources are: > OS (as a docker) has 4cpu and 32GB RAM, and we gave Solr 12GB java heap. > > If I understand you correctly this situation is not like what you had > @Gaikwad, correct? (We should also have enough physical memory for all of our > containers without getting into a problem). > > Sent with ProtonMail Secure Email. > > ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ > > On Tuesday, February 1st, 2022 at 1:34 PM, 123456780sss > 123456780...@protonmail.com.INVALID wrote: > > > we've tried to check if that's the problem but we couldn't really > > understand how to check that... > > > > what were the parameters you changed specifically? (we work with linux) > > > > thanks, > > > > 123456780sss > > > > ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ > > > > On Saturday, January 22nd, 2022 at 7:34 PM, Rajendra Gaikwad > > rajendra...@gmail.com wrote: > > > > > Another reason could be insufficient memory available with the OS. > > > > > > I faced a similar issue in the past, after releasing some amount of memory > > > > > > it works. > > > > > > e.g Machine/Server has 6 GB total memory, Java process allocated 5.4 GB > > > and > > > > > > OS left with 600MB, It was causing the same issue(unable to create native > > > > > > thread). After reducing memory allocated to the java and leaving a > > > > > > significant amount of memory for the OS, it works. > > > > > > Thanks, > > > > > > Rajendra Gaikwad > > > > > > On Thu, Jan 20, 2022 at 9:14 PM Shawn Heisey apa...@elyograg.org wrote: > > > > > > > On 1/20/22 5:54 AM, 123456780sss wrote: > > > > > > > > > However, we've checked the nproc and nofile in our cluster and right > > > > > now > > > > > > > > > > they are set to 4096 each, unlike the 1024 that was theorized. We will > > > > > > > > > > probably try to raise it to 8192 anyway, but we're not sure that the > > > > > impact > > > > > > > > > > will be as great as expected initially. Do you think it's still going > > > > > to > > > > > > > > > > solve the issue? > > > > > > > > To see what the actual effective limits are on Linux for a running > > > > > > > > process, you can do the following command, where NNNNN is the pid of the > > > > > > > > process you want to check: > > > > > > > > cat /proc/NNNNN/limits > > > > > > > > I do not know what options area available for other operating systems. > > > > > > > > 4096 is probably enough, I just like to allow something higher just in > > > > > > > > case it it suddenly needs more to handle a momentary spike in load. I > > > > > > > > think the highest thread count I ever saw for a Solr instance when > > > > > > > > checking it with jconsole is somewhere in the neighborhood of 1300, on a > > > > > > > > large install for the company I was working for at the time. Looking at > > > > > > > > the tiny Solr instance I am running for mail server, right now it has 46 > > > > > > > > threads. I have the system-wide per-user limits for nproc and nofile > > > > > > > > set to 8192, far more than I need. The entire system shows 618 > > > > > > > > threads/processes in use, which is a lot less than I expected to see. > > > > > > > > Thanks, > > > > > > > > Shawn