On 30 January 2016 at 11:01, Edgar Fuß <e...@math.uni-bonn.de> wrote: > I don't know whether this is a userland or kernel issue or a layer 8 problem. > > After running a customized kernel, I found a server powered down. > The culprit turned out to be dbcool->envsys->powerd fabulating some > temperature > rose above limits. > > envstat -d dbcool0 says: > Current CritMax WarnMax WarnMin CritMin Unit > [...] > r2_temp: 53.250 54.000 45.000 degC > [...] > > sysctl hw.dbcool0 says: > [...] > hw.dbcool0.r2_temp.Tmin = 44 > hw.dbcool0.r2_temp.Ttherm = 57 > hw.dbcool0.r2_temp.Thyst = 2 > > If I read that correctly, it means that at 54 degC, it's time for emergency > shut-down, while only at 57 degC, fans have to run at full speed. > (Also, it seems to be threatning the hardware if that temp falls below 45.) > > I have no clue where that magic value of 54 degC comes from. It's not in any > config file I can find, I don't find such a value in sys/dev/i2c/dbcool.c. > Is it the BIOS writing that value into the IC? Is it a chip manufacturer > default? > > The board is a Tyan S2882-D, in case that matters. > (Btw., does anyone know what r2_temp on that board is?) > > I turned off powerd for now. > > Thanks for any hints.
In general, I personally don't think it ever makes sense to shutdown by default when the temperature is exceeded, since most of these sensors aren't really all that reliable (especially if you're getting them over i2c, with potential bus locking issues and race conditions with BIOS / IPMI; getting a bit sidelined, at the very least, the sensor values should be dampened, which is what's done in OpenBSD's sensorsd, not sure if anything similar is done here). However, it does appear that http://bxr.su/n/etc/powerd/scripts/sensor_temperature is the powerd script responsible for such automated shutdown. For what it's worth, the envsys temperature limits you mention are most likely read directly from the chip in question, see http://BXR.SU/NetBSD/sys/dev/i2c/dbcool.c#dbcool_get_temp_limits . Did you at all try to modify these limits from the userland? (Which may or may not work, especially if something else decides to modify it behind your back.) Perhaps a potential solution may be to change them from being CRIT to be of WARN type, and/or remove the immediate shutdown from the powerd script? Cheers, Constantine.SU.