Synopsis: if sensors show missing data then reset the BMC unit before rebooting the system to prevent unable to boot long beep issue.
I found a reliably reproducible workaround for this problem retaining control continuity without the need to trip the mains breaker. This entirely prevents the long beep issue and allows the system to be used in headless remote environments without ensuring remote mains power cycle capability and/or remote hands intervention. I have not had to disable the lm(4) sensor as advised previously for the workaround and reached the conclusion this problem is not caused by the driver itself in the first place, but by a buggy BMC firmware. For this it is advisable to contact again the technical support at Supermicro and ask them for a reliable BMC firmware update which does not manifest the problem. After running for a longer period (non specific or deterministic, above 30min), the sensors start to display wrong (missing) values and can not provide data points to the BMC firmware. This is seen both in IPMI direct and networked access and in the web based management interface. At this point, a reboot would get the system unable to boot manifesting the dreaded long beep. Only a power cycle of mains (power supply breaker or power distribution unit) for a couple of seconds unblocks the system and it is capable of successfully booting up again. This however totally undermines the remote control capabilities of the system effectively turning it into a continuous source of remote management manual reboot requests via intervention events for mains power cycle (stop and start). The workaround for this is to reset the BMC before attempting to reboot the system, and it works over the network directly over IPMI and also via the web based BMC interface likewise. This only reboots the IPMI controller (not the system) and its embedded firmware, then after a couple of minutes the sensors poll actual correct data and display it properly. At this point a system reboot issued succeeds as expected and everything the system boots up and works properly, until some non specific longer time passes again (from 1h to days) and the BMC controller gets stuck again (with a certainty it gets stuck) for which the indication is missing sensors data and no reboot capability with the long beep indication. This is NOT OS specific unless the driver polling the sensors causes the sensors sub-system in the embedded controller OS to crash, the only factor affecting it so far is found to be the time running the system without mains power cycle. It is a flaw of the BMC firmware for which the solution for sure is to demand an updated firmware from Supermicro without this fault. It would help if more people voice their concerns over this so an updated BMC firmware is issued from Supermicro technical support and published on their web site. Here is how it looks when the BMC is stuck: $ ipmi-sensor System Temp | no reading | ns CPU Temp | no reading | ns CPU FAN | no reading | ns SYS FAN | no reading | ns CPU Vcore | no reading | ns Vichcore | no reading | ns +3.3VCC | no reading | ns VDIMM | no reading | ns +5 V | no reading | ns +12 V | no reading | ns +3.3VSB | no reading | ns VBAT | no reading | ns Chassis Intru | no reading | ns PS Status | 0x00 | ok $ ipmi-sensor-detail System Temp | na | | na | na | na | na | na | na | na CPU Temp | na | | na | na | na | na | na | na | na CPU FAN | na | | na | na | na | na | na | na | na SYS FAN | na | | na | na | na | na | na | na | na CPU Vcore | na | | na | na | na | na | na | na | na Vichcore | na | | na | na | na | na | na | na | na +3.3VCC | na | | na | na | na | na | na | na | na VDIMM | na | | na | na | na | na | na | na | na +5 V | na | | na | na | na | na | na | na | na +12 V | na | | na | na | na | na | na | na | na +3.3VSB | na | | na | na | na | na | na | na | na VBAT | na | | na | na | na | na | na | na | na Chassis Intru | na | discrete | na | na | na | na | na | na | na PS Status | 0x0 | discrete | 0x00ff| na | na | na | na | na | na Here is how it looks after BMC reset: $ ipmi-reset Sent cold reset command to MC ~75 seconds later: $ ipmi-sensor System Temp | 38 degrees C | ok CPU Temp | 38 degrees C | ok CPU FAN | no reading | ns SYS FAN | no reading | ns CPU Vcore | 1.10 Volts | ok Vichcore | 1.04 Volts | ok +3.3VCC | 3.31 Volts | ok VDIMM | 1.53 Volts | ok +5 V | 5.09 Volts | ok +12 V | 12.03 Volts | ok +3.3VSB | 3.28 Volts | ok VBAT | 3.12 Volts | ok Chassis Intru | 0x00 | ok PS Status | 0x00 | ok $ ipmi-sensor-detail System Temp | 38.000 | degrees C | ok | -9.000 | -7.000 | -5.000 | 75.000 | 77.000 | 79.000 CPU Temp | 38.000 | degrees C | ok | -11.000 | -8.000 | -5.000 | 85.000 | 90.000 | 95.000 CPU FAN | na | | na | na | na | na | na | na | na SYS FAN | na | | na | na | na | na | na | na | na CPU Vcore | 1.096 | Volts | ok | 0.640 | 0.664 | 0.688 | 1.344 | 1.408 | 1.472 Vichcore | 1.040 | Volts | ok | 0.808 | 0.824 | 0.840 | 1.160 | 1.176 | 1.192 +3.3VCC | 3.312 | Volts | ok | 2.816 | 2.880 | 2.944 | 3.584 | 3.648 | 3.712 VDIMM | 1.528 | Volts | ok | 1.312 | 1.328 | 1.344 | 1.648 | 1.664 | 1.680 +5 V | 5.088 | Volts | ok | 4.096 | 4.320 | 4.576 | 5.344 | 5.600 | 5.632 +12 V | 12.031 | Volts | ok | 10.706 | 10.600 | 10.494 | 13.091 | 13.197 | 13.303 +3.3VSB | 3.280 | Volts | ok | 2.816 | 2.880 | 2.944 | 3.584 | 3.648 | 3.712 VBAT | 3.120 | Volts | ok | 2.560 | 2.624 | 2.688 | 3.328 | 3.392 | 3.456 Chassis Intru | 0x0 | discrete | 0x0000| na | na | na | na | na | na PS Status | 0x0 | discrete | 0x00ff| na | na | na | na | na | na The main board with this specific workaround applicable is: MBD-X7SPA-HF-D525-O The main board was bought in May 2011 brand new in original packing from official retailer carrying Supermicro products and uses memory modules from the qualified vendor list. http://www.supermicro.com/products/motherboard/ATOM/ICH9/X7SPA-HF-D525.cfm The BMC and BIOS firmwares are the latest available from the Supermicro web site: Firmware Revision: 03.16 Firmware Build Time: 2014-06-30 Supermicro X7SPA/X7SPE/X7SPT Series BIOS Date:07/19/13 BIOS Rev:1.2b Hopefully this helps in further diagnostics and in the meantime as a workaround to allow people with boards having the same problem to operate them remotely until a BMC firmware is available fixing the issue. Regards, Anton