Re: [CentOS] system unresponsive

2019-05-24 Thread Thomas Bendler
You should be able to recognize or monitor this by configure the syslog to print everything on a specific TTY or use the remote logging functionality. Kind regards Thomas Am Do., 23. Mai 2019 um 18:31 Uhr schrieb Jon Pruente < jprue...@riskanalytics.com>: > On Wed, May 22, 2019 at 10:02 AM mark

Re: [CentOS] system unresponsive

2019-05-23 Thread mark
Jon Pruente wrote: > On Wed, May 22, 2019 at 10:02 AM mark wrote: > > >> That seems unlikely. Foe one, I've seen that... but I *always* see >> entries in the log about the oom-killer being invoked. For another, this >> isn't a compute node, it's *only* a fileserver, serving projects, home >> direc

Re: [CentOS] system unresponsive

2019-05-23 Thread Jon Pruente
On Wed, May 22, 2019 at 10:02 AM mark wrote: > That seems unlikely. Foe one, I've seen that... but I *always* see entries > in the log about the oom-killer being invoked. For another, this isn't a > compute node, it's *only* a fileserver, serving projects, home > directories, and backups (home-gr

Re: [CentOS] system unresponsive

2019-05-22 Thread Steven Tardy
On Wed, May 22, 2019 at 10:22 AM mark wrote: > It seems unlikely. It's a 4U server, with 36 disks (and the dual root > disks), in a machine room, and ipmitool sel list shows nada, nor are there > any warnings, as I've seen on other systems occasionally, that the CPU is > overheating, and is being

Re: [CentOS] system unresponsive

2019-05-22 Thread Gordon Messmer
On 5/22/19 6:57 AM, Scott Silverman wrote: In the past I've found that the console may have blanked (due to time) and when the system locked up/hung it won't unblank. Booting with "consoleblank=0" on the kernel command line will ensure that whatever is printed to the console (oops, panic, etc) wi

Re: [CentOS] system unresponsive

2019-05-22 Thread mark
Noam Bernstein via CentOS wrote: > Out of memory? We’ve definitely seen similar symptoms (it’s been a > while, so I’m not sure they were identical) for compute nodes running > large memory jobs. That seems unlikely. Foe one, I've seen that... but I *always* see entries in the log about the oom-ki

Re: [CentOS] system unresponsive

2019-05-22 Thread Noam Bernstein via CentOS
Out of memory? We’ve definitely seen similar symptoms (it’s been a while, so I’m not sure they were identical) for compute nodes running large memory jobs. Noam ___ CentOS mailing list CentOS@centos.org

Re: [CentOS] system unresponsive

2019-05-22 Thread mark
Scott Silverman wrote: > In the past I've found that the console may have blanked (due to time) > and when the system locked up/hung it won't unblank. Booting with > "consoleblank=0" on the kernel command line will ensure that whatever is > printed to the console (oops, panic, etc) will be there fo

Re: [CentOS] system unresponsive

2019-05-22 Thread mark
Stephen John Smoogen wrote: > On Wed, 22 May 2019 at 09:30, mark wrote: > >> Ok, we used to get this occasionally on cluster nodes, and we just got >> it on a fileserver (very bad). The system is discovered to be >> unresponsive: >> it doesn't ping, and plugging a console in, you can see that it's

Re: [CentOS] system unresponsive

2019-05-22 Thread Simon Matter via CentOS
> Ok, we used to get this occasionally on cluster nodes, and we just got it > on a fileserver (very bad). The system is discovered to be unresponsive: > it doesn't ping, and plugging a console in, you can see that it's not > dead, but there nothing at all on the screen, nor does it respond to even

Re: [CentOS] system unresponsive

2019-05-22 Thread Scott Silverman
In the past I've found that the console may have blanked (due to time) and when the system locked up/hung it won't unblank. Booting with "consoleblank=0" on the kernel command line will ensure that whatever is printed to the console (oops, panic, etc) will be there for you to see when you connect.

Re: [CentOS] system unresponsive

2019-05-22 Thread Stephen John Smoogen
On Wed, 22 May 2019 at 09:30, mark wrote: > Ok, we used to get this occasionally on cluster nodes, and we just got it > on a fileserver (very bad). The system is discovered to be unresponsive: > it doesn't ping, and plugging a console in, you can see that it's not > dead, but there nothing at all