On 9/24/19 4:16 PM, Holger Hoffstätte wrote:
Hi,

I recently upgraded my home network with two AQ107-based NICs and a
multi-speed switch. Everything works great, but I couldn't help but notice
very weird hwmon temperature output (which I wanted to use for monitoring
and alerting).

Both cards identify as:

$lspci -v -s 06:00.0
06:00.0 Ethernet controller: Aquantia Corp. AQC107 NBase-T/IEEE 802.3bz 
Ethernet Controller [AQtion] (rev 02)
     Subsystem: ASUSTeK Computer Inc. AQC107 NBase-T/IEEE 802.3bz Ethernet 
Controller [AQtion]

In one machine lm_sensors says:

eth0-pci-0200
Adapter: PCI adapter
PHY Temperature: +315.1°C

This seems quite wrong since the card is only slightly warm to the touch, and
315.1 is exactly 255 + 60.1 - the latter value feels more like the actual
temperature.

On a second machine it says:

eth0-pci-0600
Adapter: PCI adapter
PHY Temperature: +6977.0°C

I feel qualified to say that is definitely wrong as well, since the machine is
currently not melting its way to the earth's core, and also only slightly warm
to the touch. :)

Both cards also reported wrong values with kernel 5.2, but since I'm on 5.3.1
I might as well report the current wrongness.

Do we know who's to blame here - motherboards, NICs, driver, kernel, hwmon
infrastructure? I believe the hwmon patches landed first in 5.2.

Another observation: the hwmon output immediately becomes sane (~58°)
when I down the link with ifconfig. As soon as I bring the link back up,
the temperature jumps from 58° to 6976° in one second.
It seems that the presence of the carrier somehow mangles the sensor
readings. I hope this helps to find the issue.

thanks,
Holger

Reply via email to