On 10/21/23 03:31, Ing. Andrea Vettori wrote:
They both were running fine since a couple years (we upgraded from SOLR 8 to 9
with full reindexing some time ago).
Yesterday one of the server died with JVM crash with the following reason (I
have the full JVM trace if needed).
Once restarted the server ran fine and received data updates every 15 minutes,
and responded to queries during the day.
Today the server died around the same time with the same JVM trace.
...
siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr:
0x0000000000000003
SIG11 on a previously-stable program is usually RAM.
https://tldp.org/FAQ/sig11/html/index.html
Make a memtest86 thumbdrive, boot the offending server off it, and let
it run for a few days.
Check fans: they don't last forever, SIMMS (or possibly CPU) may be
overheating from inadequate cooling.
If RAM checks out, install stress(-ng) and try pushing the CPU, and/or
both RAM+CPU for a few days to see if that crashes it.
Force-fsck the drive where the index lives, just to cover all bases
Dima