On 11/1/22 19:57, David Christensen wrote:
On 11/1/22 06:20, gene heskett wrote:
Greetings all;
I am now suffering from a hang on reboot. And in looking for info, I
find that gkrellm can only see temps. I don't push this
so they stay in the 29 to 30C range. gkrellm is, and has been part of
my housekeeping for 20 years.
But mbmon was not installed, but it and all its suggested
dependency's are now, and two reboots, which took about 20
minutes just to get to the bios screen while dancing a jig on the del
key. During that time I can hear a
very faint clicking sound from time to time. zero activity on any
drive controller led, there are two controllers,
one of course on the mobo, and one that interfaces a 4 drive raid10
for the /home.
Mobo is: Asus PRIME Z370-A II, BIOS 0801 04/24/2019
mbmon claims to run by itself but needs root, and when ran with sudo,
reports
gene@coyote:~$ sudo mbmon
[sudo] password for gene:
No Hardware Monitor found!!
InitMBInfo: Success
What do you suggest I install so this Asus mobo can be monitored.
Those symptoms would seem to indicate that a disk drive is failing,
causing the motherboard firmware and/or the HBA/RAID controller
firmware to enter a retry/ timeout loop.
I would try:
1. Enter the motherboard firmware setup utility during POST and look
for warnings, errors, log entries, etc..
2. Enter the HBA/RAID firmware configuration utility during POST and
look for warnings, errors, log entries, etc..
3. Examine dmesg(1) after boot, looking for errors, warnings, etc..
4. Examine the files in /var/log after boot, looking for warnings,
errors, etc..
BTDT twice this morning, Its all clean to this point.
5. Run SMART short tests on all drives, generate SMART reports for all
drives, and then look at the reports for symptoms of a failing drive.
I have not done that yet. /dev/sda says its fine.
Now a long test is running on /dev/sde, the first of 4 in the raid10.
3 to go after this one. There are more, but they are late mounts, and not
in /etc/fstab, they are in other machines all mounted thru /sshnet. My
local network's contents.
6. Examine dmesg(1) and /var/log files again after the machine has
been up for a while and look for warnings, errors, etc..
Nothing, it just sits there with the early boot Asus blurb on screen,
for 20 minutes or more.
7. POST and Debian boot messages can scroll by faster than you can see
them, and I am unsure if everything ends up in a log file. If you
cannot find any clues using the above steps, set up a video camera to
record the console during boot. Then, look at the video for warnings,
errors, etc..
My impression is that its all pre-bios, pre-boot. Once it reaches the
inital grub screen, the rest of the boot seems to be quite normal speed.
And if I can get into the bios, it looks perfectly normal. I'll send
smartctl after more info. And I just put a dvm on a drive plug, getting
5.1 and 12.1 voltages there, so I don't think the psu is going down.
David
.
Take care & stay well, David, I'm going to go check some blankets &
eyelids for leaks.
Cheers, Gene Heskett.
--
"There are four boxes to be used in defense of liberty:
soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author, 1940)
If we desire respect for the law, we must first make the law respectable.
- Louis D. Brandeis
Genes Web page <http://geneslinuxbox.net:6309/>