Launchpad has imported 65 comments from the remote bug at
https://bugzilla.kernel.org/show_bug.cgi?id=177311.

If you reply to an imported comment from within Launchpad, your comment
will be sent to the remote bug automatically. Read more about
Launchpad's inter-bugtracker facilities at
https://help.launchpad.net/InterBugTracking.

------------------------------------------------------------------------
On 2016-10-12T20:59:43+00:00 ck+kernelbugzilla wrote:

I've noticed, when I do enable CONFIG_SENSORS_JC42 as a module or build into
my kernel, this causes a very high rate of interrupts on i801_smbus - about
6000-8000 per second according to /proc/interrupts. After 20 minutes, there
were about 5 million interrupts generated on i801_smbus.
 
When I do unload the module jc42, the interrupts do not stop, until I do a
complete reboot.
 
Mainboard: Supermicro A1SRM-2758F
Kernel: Gentoo-Sources 4.8.1 (Happens also with Vanilla 4.8.1 and older kernel
versions)
 
dmesg:
[    8.319900] i801_smbus 0000:00:1f.3: enabling device (0140 -> 0143)
[    8.321864] i801_smbus 0000:00:1f.3: SMBus using PCI interrupt
[    8.326098] ismt_smbus 0000:00:13.0: enabling device (0140 -> 0142)
 
lspci:
00:1f.3 SMBus: Intel Corporation Atom processor C2000 PCU SMBus (rev 02)

When the module is loaded, I am also getting this errors:
[   73.934901] ismt_smbus 0000:00:13.0: completion wait timed out
[   74.974970] ismt_smbus 0000:00:13.0: completion wait timed out
[   76.014949] ismt_smbus 0000:00:13.0: completion wait timed out
[   77.054903] ismt_smbus 0000:00:13.0: completion wait timed out
[   78.094961] ismt_smbus 0000:00:13.0: completion wait timed out
[   79.134982] ismt_smbus 0000:00:13.0: completion wait timed out
[   80.175116] ismt_smbus 0000:00:13.0: completion wait timed out
[   81.215057] ismt_smbus 0000:00:13.0: completion wait timed out

Reply at: https://bugs.launchpad.net/ubuntu/+source/linux-
hwe-5.11/+bug/1931001/comments/0

------------------------------------------------------------------------
On 2016-10-12T21:00:53+00:00 ck+kernelbugzilla wrote:

The jc42 module seems to work, as lm_sensors do find the sensors, after
loading it:

Galactica ~ # sensors
jc42-i2c-1-19
Adapter: SMBus I801 adapter at e000
temp1:        +30.8°C  (low  =  +0.0°C)                  ALARM (HIGH, CRIT)
                       (high =  +0.0°C, hyst =  +0.0°C)
                       (crit =  +0.0°C, hyst =  +0.0°C)

jc42-i2c-1-1a
Adapter: SMBus I801 adapter at e000
temp1:        +29.5°C  (low  =  +0.0°C)                  ALARM (HIGH, CRIT)
                       (high =  +0.0°C, hyst =  +0.0°C)
                       (crit =  +0.0°C, hyst =  +0.0°C)

jc42-i2c-1-18
Adapter: SMBus I801 adapter at e000
temp1:        +27.2°C  (low  =  +0.0°C)                  ALARM (HIGH, CRIT)
                       (high =  +0.0°C, hyst =  +0.0°C)
                       (crit =  +0.0°C, hyst =  +0.0°C)

jc42-i2c-1-1b
Adapter: SMBus I801 adapter at e000
temp1:        +28.2°C  (low  =  +0.0°C)                  ALARM (HIGH, CRIT)
                       (high =  +0.0°C, hyst =  +0.0°C)
                       (crit =  +0.0°C, hyst =  +0.0°C)

Reply at: https://bugs.launchpad.net/ubuntu/+source/linux-
hwe-5.11/+bug/1931001/comments/1

------------------------------------------------------------------------
On 2016-11-28T15:19:42+00:00 linux wrote:

You need to set the temperature limits correctly. Without limits, the
chips will persistently generate alarms which is the likely cause of the
interrupts.

That won't solve the completion interrupt timeouts, though. That may be
another problem.

Reply at: https://bugs.launchpad.net/ubuntu/+source/linux-
hwe-5.11/+bug/1931001/comments/2

------------------------------------------------------------------------
On 2016-11-28T16:10:21+00:00 ck+kernelbugzilla wrote:

(In reply to Guenter Roeck from comment #2)
> You need to set the temperature limits correctly. Without limits, the chips
> will persistently generate alarms which is the likely cause of the
> interrupts.
> 
> That won't solve the completion interrupt timeouts, though. That may be
> another problem.

Hi!
Thanks for your answer. I've gave a try and set those limits, so sensors does 
not show any more ALARM. Seems not to be the cause, because after settings, the 
interrupts are still generated massivley..

jc42-i2c-1-1b
Adapter: SMBus I801 adapter at e000
RAM:          +30.0°C  (low  =  +0.0°C)
                       (high = +80.0°C, hyst = +80.0°C)
                       (crit = +80.0°C, hyst = +80.0°C)

jc42-i2c-1-19
Adapter: SMBus I801 adapter at e000
RAM:          +32.0°C  (low  =  +0.0°C)
                       (high = +80.0°C, hyst = +80.0°C)
                       (crit = +80.0°C, hyst = +80.0°C)
jc42-i2c-1-1a
Adapter: SMBus I801 adapter at e000
RAM:          +31.0°C  (low  =  +0.0°C)
                       (high = +80.0°C, hyst = +80.0°C)
                       (crit = +80.0°C, hyst = +80.0°C)

jc42-i2c-1-18
Adapter: SMBus I801 adapter at e000
RAM:          +28.0°C  (low  =  +0.0°C)
                       (high = +80.0°C, hyst = +80.0°C)
                       (crit = +80.0°C, hyst = +80.0°C)

Cheers
Conrad

Reply at: https://bugs.launchpad.net/ubuntu/+source/linux-
hwe-5.11/+bug/1931001/comments/3

------------------------------------------------------------------------
On 2016-11-28T17:40:22+00:00 linux wrote:

Weird, especially since the chips should not generate interrupts in the
first place unless it is explicitly enabled (which the driver doesn't
do, or at least shouldn't do). My wild guess is that taking the chips
out of shutdown mode for some reasons enables the interrupt.

Can you send the output of "i2cdump -y -f 1 0x18 w" ? Also, do the
interrupts stop when you unload the driver ?

Thanks,
Guenter

Reply at: https://bugs.launchpad.net/ubuntu/+source/linux-
hwe-5.11/+bug/1931001/comments/4

------------------------------------------------------------------------
On 2016-11-28T17:41:34+00:00 linux wrote:

Please forget the question about the unload, as you already answered it.

Reply at: https://bugs.launchpad.net/ubuntu/+source/linux-
hwe-5.11/+bug/1931001/comments/5

------------------------------------------------------------------------
On 2016-11-28T17:43:33+00:00 ck+kernelbugzilla wrote:

(In reply to Guenter Roeck from comment #4)
> Weird, especially since the chips should not generate interrupts in the
> first place unless it is explicitly enabled (which the driver doesn't do, or
> at least shouldn't do). My wild guess is that taking the chips out of
> shutdown mode for some reasons enables the interrupt.
> 
> Can you send the output of "i2cdump -y -f 1 0x18 w" ?

Here we go:

╭─root@Galactica ~
╰─➤  i2cdump -y -f 1 0x18 w
     0,8  1,9  2,a  3,b  4,c  5,d  6,e  7,f
00: ef00 0000 0005 0000 0005 c801 1f00 0182
08: 0000 0000 0000 0000 0000 0000 0000 0000
10: 0000 0000 0000 0000 0000 0000 0000 0000
18: 0000 0000 0000 0000 0000 0000 0000 0000
20: 0000 0000 0000 0000 0000 0000 0000 0000
28: 0000 0000 0000 0000 0000 0000 0000 0000
30: 0000 0000 0000 0000 0000 0000 0000 0000
38: 0000 0000 0000 0000 0000 0000 0000 0000
40: 0000 0000 0000 0000 0000 0000 0000 0000
48: 0000 0000 0000 0000 0000 0000 0000 0000
50: 0000 0000 0000 0000 0000 0000 0000 0000
58: 0000 0000 0000 0000 0000 0000 0000 0000
60: 0000 0000 0000 0000 0000 0000 0000 0000
68: 0000 0000 0000 0000 0000 0000 0000 0000
70: 0000 0000 0000 0000 0000 0000 0000 0000
78: 0000 0000 0000 0000 0000 0000 0000 0000
80: 0000 0000 0000 0000 0000 0000 0000 0000
88: 0000 0000 0000 0000 0000 0000 0000 0000
90: 0000 0000 0000 0000 0000 0000 0000 0000
98: 0000 0000 0000 0000 0000 0000 0000 0000
a0: 0000 0000 0000 0000 0000 0000 0000 0000
a8: 0000 0000 0000 0000 0000 0000 0000 0000
b0: 0000 0000 0000 0000 0000 0000 0000 0000
b8: 0000 0000 0000 0000 0000 0000 0000 0000
c0: 0000 0000 0000 0000 0000 0000 0000 0000
c8: 0000 0000 0000 0000 0000 0000 0000 0000
d0: 0000 0000 0000 0000 0000 0000 0000 0000
d8: 0000 0000 0000 0000 0000 0000 0000 0000
e0: 0000 0000 0000 0000 0000 0000 0000 0000
e8: 0000 0000 0000 0000 0000 0000 0000 0000
f0: 0000 0000 0000 0000 0000 0000 0000 0000
f8: 0000 0000 0000 0000 0000 0000 0000 0000

>Also, do the interrupts stop when you unload the driver ?

No, they stop first, when I do a complete server reboot.

Reply at: https://bugs.launchpad.net/ubuntu/+source/linux-
hwe-5.11/+bug/1931001/comments/6

------------------------------------------------------------------------
On 2016-11-28T17:46:49+00:00 ck+kernelbugzilla wrote:

Ah, forgot to add. Loading the old "eeprom"-module causes the same
problem with the interrupts, see [1]. Maybe this is somehow connected?

[1] https://bugzilla.kernel.org/show_bug.cgi?id=177291

Reply at: https://bugs.launchpad.net/ubuntu/+source/linux-
hwe-5.11/+bug/1931001/comments/7

------------------------------------------------------------------------
On 2016-11-28T17:59:07+00:00 linux wrote:

This is an Atmel AT30TS00. Per configuration register, events are
disabled, and there is no event pending, meaning it should not really be
the JC42s generating the interrupts.

Another question: If you only load the i801 module after boot (ie
prevent the jc42 module from loading, eg by blacklisting it, but still
load the i801 module), do you still get the interrupts ?

Thanks,
Guenter

Reply at: https://bugs.launchpad.net/ubuntu/+source/linux-
hwe-5.11/+bug/1931001/comments/8

------------------------------------------------------------------------
On 2016-11-28T18:01:22+00:00 ck+kernelbugzilla wrote:

(In reply to Guenter Roeck from comment #8)
> Another question: If you only load the i801 module after boot (ie prevent
> the jc42 module from loading, eg by blacklisting it, but still load the i801
> module), do you still get the interrupts ?

That's my current situation ;-) jc42 is only a module, which is
currently not being loaded at system startup and i801 is compiled into
my kernel. In such case, zero interrupts are generated on i801_smbus.

Cheers
Conrad

Reply at: https://bugs.launchpad.net/ubuntu/+source/linux-
hwe-5.11/+bug/1931001/comments/9

------------------------------------------------------------------------
On 2016-11-28T18:07:07+00:00 linux wrote:

#7 suggests a problem with the i801 driver and its interrupt handling.
#9 contradicts that a bit, though.

Maybe the C2000 has problems with interrupts, or implements it
differently than handled by the driver. This may be triggered by an
actual access on the bus. You could try to confirm it by running the
i2cdump command after booting without the jc42 module loaded (i2cdetect
-y 1 should show no reserved addresses) and see if the interrupts start
happening.

Thanks,
Guenter

Reply at: https://bugs.launchpad.net/ubuntu/+source/linux-
hwe-5.11/+bug/1931001/comments/10

------------------------------------------------------------------------
On 2016-11-28T18:23:06+00:00 ck+kernelbugzilla wrote:

(In reply to Guenter Roeck from comment #10)
> #7 suggests a problem with the i801 driver and its interrupt handling. #9
> contradicts that a bit, though.
> 
> Maybe the C2000 has problems with interrupts, or implements it differently
> than handled by the driver. This may be triggered by an actual access on the
> bus. You could try to confirm it by running the i2cdump command after
> booting without the jc42 module loaded (i2cdetect -y 1 should show no
> reserved addresses) and see if the interrupts start happening.
> 
> Thanks,
> Guenter

You nail it ;-) Right after executing "i2cdump -y -f 1 0x18 w", the
interrupts start massively. But jc42 wasn't loaded.

Cheers
Conrad

Reply at: https://bugs.launchpad.net/ubuntu/+source/linux-
hwe-5.11/+bug/1931001/comments/11

------------------------------------------------------------------------
On 2016-11-28T18:32:34+00:00 ck+kernelbugzilla wrote:

Sorry, but I don't know, what do you mean here by reserved?

Before/after executing i2cdump (output is the same):

╭─root@Galactica ~
╰─➤  i2cdetect -y 1
     0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f
00:          -- -- -- -- -- 08 -- -- -- -- -- -- --
10: -- -- -- -- -- -- -- -- 18 19 1a 1b -- -- -- --
20: -- -- -- -- -- -- -- -- -- -- -- -- -- -- 2e --
30: 30 31 32 33 -- -- -- -- -- -- -- -- -- -- -- --
40: -- -- -- -- -- -- -- -- -- 49 -- -- -- -- -- --
50: 50 51 52 53 -- -- -- -- -- -- -- -- -- -- -- --
60: -- 61 -- -- -- -- -- -- -- 69 -- -- 6c -- -- --
70: -- -- -- -- -- -- -- --

A simple "i2cdetect -y 1" also triggers the interrupts.

Reply at: https://bugs.launchpad.net/ubuntu/+source/linux-
hwe-5.11/+bug/1931001/comments/12

------------------------------------------------------------------------
On 2016-11-28T18:44:12+00:00 linux wrote:

With "reserved" I meant "a driver for a chip is loaded". After you load
the jc42 driver (or the eeprom driver), you'll see that some of the
addresses show up as "UU".

Anyway, I think the conclusion is that the i801 driver has problems with
interrupt support on your hardware, as I suspected in #10. Issue #177291
is really the same problem. Jean maintains that driver as well, so he
should be able to help.

Reply at: https://bugs.launchpad.net/ubuntu/+source/linux-
hwe-5.11/+bug/1931001/comments/13

------------------------------------------------------------------------
On 2016-11-28T18:50:43+00:00 ck+kernelbugzilla wrote:

(In reply to Guenter Roeck from comment #13)
> With "reserved" I meant "a driver for a chip is loaded". After you load the
> jc42 driver (or the eeprom driver), you'll see that some of the addresses
> show up as "UU".

Ah I see. Yes, after loading jc42, I can see "UU".

╭─root@Galactica ~
╰─➤  i2cdetect -y 1
     0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f
00:          -- -- -- -- -- 08 -- -- -- -- -- -- --
10: -- -- -- -- -- -- -- -- UU UU UU UU -- -- -- --
20: -- -- -- -- -- -- -- -- -- -- -- -- -- -- 2e --
30: 30 31 32 33 -- -- -- -- -- -- -- -- -- -- -- --
40: -- -- -- -- -- -- -- -- -- 49 -- -- -- -- -- --
50: 50 51 52 53 -- -- -- -- -- -- -- -- -- -- -- --
60: -- 61 -- -- -- -- -- -- -- 69 -- -- 6c -- -- --
70: -- -- -- -- -- -- -- --

> Anyway, I think the conclusion is that the i801 driver has problems with
> interrupt support on your hardware, as I suspected in #10. Issue #177291 is
> really the same problem. Jean maintains that driver as well, so he should be
> able to help.

Should I close #177291 as a duplicate, as it's mine ticket.
Thanks for your support. Hope, Jean has an idea :)

Reply at: https://bugs.launchpad.net/ubuntu/+source/linux-
hwe-5.11/+bug/1931001/comments/14

------------------------------------------------------------------------
On 2016-11-29T08:32:39+00:00 jdelvare wrote:

Thanks Guenter for stepping in. I always suspected the problem was with
the SMBus controller (i2c-i801 driver) and I intended to comment about
it long ago but then forgot, sorry about that :-(

Reply at: https://bugs.launchpad.net/ubuntu/+source/linux-
hwe-5.11/+bug/1931001/comments/15

------------------------------------------------------------------------
On 2016-11-29T08:41:00+00:00 jdelvare wrote:

Conrad, I need detailed information about the SMBus PCI devices and the
IRQs on your machine. Please attach the output of:

$ /sbin/lspci -nn | grep SMBus

$ /sbin/lspci -xxx -s <device>
(for each device listed above)

$ cat /proc/interrupts

Also look for any message related to i2c, SMBus, i801 or the PCI devices
above in the kernel logs.

Reply at: https://bugs.launchpad.net/ubuntu/+source/linux-
hwe-5.11/+bug/1931001/comments/16

------------------------------------------------------------------------
On 2016-11-29T08:57:43+00:00 ck+kernelbugzilla wrote:

Hello Jean!

(In reply to Jean Delvare from comment #16)
> $ /sbin/lspci -nn | grep SMBus

00:13.0 System peripheral [0880]: Intel Corporation Atom processor C2000 SMBus 
2.0 [8086:1f15] (rev 02)
00:1f.3 SMBus [0c05]: Intel Corporation Atom processor C2000 PCU SMBus 
[8086:1f3c] (rev 02)
 
> $ /sbin/lspci -xxx -s <device>
> (for each device listed abov

╭─root@Galactica /home/kostecki  
╰─➤  lspci -xxx -s 00:13.0
00:13.0 System peripheral: Intel Corporation Atom processor C2000 SMBus 2.0 
(rev 02)
00: 86 80 15 1f 46 05 10 00 02 00 80 08 00 00 00 00
10: 04 40 f1 ff 0f 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 d9 15 20 08
30: 00 00 00 00 40 00 00 00 00 00 00 00 ff 01 00 00
40: 10 80 92 00 01 80 00 10 20 08 04 00 00 00 00 00
50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
60: 00 00 00 00 10 00 00 00 00 00 00 00 00 00 00 00
70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
80: 01 8c 03 00 00 00 00 00 00 00 00 00 05 00 81 01
90: 0c f0 ef fe 00 00 00 00 a6 41 00 00 00 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 01 00 10 00 10 80
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

╭─root@Galactica /home/kostecki  
╰─➤  lspci -xxx -s 00:1f.3
00:1f.3 SMBus: Intel Corporation Atom processor C2000 PCU SMBus (rev 02)
00: 86 80 3c 1f 43 01 98 02 02 00 05 0c 00 00 00 00
10: 00 00 50 df 00 00 00 00 00 00 00 00 00 00 00 00
20: 01 e0 00 00 00 00 00 00 00 00 00 00 d9 15 20 08
30: 00 00 00 00 00 00 00 00 00 00 00 00 ff 02 00 00
40: 11 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
60: 03 04 04 00 00 00 08 08 00 00 00 00 00 00 00 00
70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
80: 04 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 0f 02 01 03 03 03 00


> $ cat /proc/interrupts
 
See attachment.

> Also look for any message related to i2c, SMBus, i801 or the PCI devices
> above in the kernel logs.

╭─root@Galactica /
╰─➤  dmesg|grep -i smbus

[    7.968653] i801_smbus 0000:00:1f.3: enabling device (0140 -> 0143)
[    7.970338] i801_smbus 0000:00:1f.3: SMBus using PCI interrupt
[    7.974068] ismt_smbus 0000:00:13.0: enabling device (0140 -> 0142)
[  974.471917] ismt_smbus 0000:00:13.0: completion wait timed out
[  975.512022] ismt_smbus 0000:00:13.0: completion wait timed out
[  976.552097] ismt_smbus 0000:00:13.0: completion wait timed out
[  977.592124] ismt_smbus 0000:00:13.0: completion wait timed out
[  978.632168] ismt_smbus 0000:00:13.0: completion wait timed out
[  979.682207] ismt_smbus 0000:00:13.0: completion wait timed out
[  980.712251] ismt_smbus 0000:00:13.0: completion wait timed out
[  981.752310] ismt_smbus 0000:00:13.0: completion wait timed out

The timeout messages are only shown, when I do load jc42.
I am also attaching my whole dmesg.

Cheers
Conrad

Reply at: https://bugs.launchpad.net/ubuntu/+source/linux-
hwe-5.11/+bug/1931001/comments/17

------------------------------------------------------------------------
On 2016-11-29T08:58:23+00:00 ck+kernelbugzilla wrote:

Created attachment 246221
cat /proc/interrupts

Reply at: https://bugs.launchpad.net/ubuntu/+source/linux-
hwe-5.11/+bug/1931001/comments/18

------------------------------------------------------------------------
On 2016-11-29T08:58:35+00:00 ck+kernelbugzilla wrote:

Created attachment 246231
dmesg output

Reply at: https://bugs.launchpad.net/ubuntu/+source/linux-
hwe-5.11/+bug/1931001/comments/19

------------------------------------------------------------------------
On 2016-11-29T10:45:16+00:00 jdelvare wrote:

Can you blacklist ismt-msi, reboot and see if it makes any difference?

Reply at: https://bugs.launchpad.net/ubuntu/+source/linux-
hwe-5.11/+bug/1931001/comments/20

------------------------------------------------------------------------
On 2016-11-29T11:28:39+00:00 ck+kernelbugzilla wrote:

(In reply to Jean Delvare from comment #20)
> Can you blacklist ismt-msi, reboot and see if it makes any difference?

No, didn't changed anything. I've compiled a new kernel without ismt-msi
(CONFIG_I2C_ISMT=n) and still after loading jc42 interrupts go very
high.

Reply at: https://bugs.launchpad.net/ubuntu/+source/linux-
hwe-5.11/+bug/1931001/comments/21

------------------------------------------------------------------------
On 2016-11-29T11:56:23+00:00 jdelvare wrote:

OK, thanks. I have added Intel folks to Cc. I can't find the register
descriptions for the Atom C2000 SMBus function, so there's not so much I
can do.

Conrad, support for the SMBus in this CPU family was added several years
ago to the i2c-i801 driver, so I am wondering why this bug is only
reported now.

Is this new hardware for you? Or you have it for some time, and it was
working fine so far, and broke with a kernel or OS update?

Reply at: https://bugs.launchpad.net/ubuntu/+source/linux-
hwe-5.11/+bug/1931001/comments/22

------------------------------------------------------------------------
On 2016-11-29T12:42:30+00:00 jarkko.nikula wrote:

I found some datasheet through Avoton C2750
http://ark.intel.com/products/77987/Intel-Atom-Processor-C2750-4M-Cache-2_40-GHz
->
https://www-ssl.intel.com/content/dam/www/public/us/en/documents/datasheets/atom-c2000-microserver-datasheet.pdf

I guess both C2758 and C2750 are compatible as they are listed in C2000
Product Family for Communications.

Reply at: https://bugs.launchpad.net/ubuntu/+source/linux-
hwe-5.11/+bug/1931001/comments/23

------------------------------------------------------------------------
On 2016-11-29T13:07:42+00:00 ck+kernelbugzilla wrote:

(In reply to Jean Delvare from comment #22)
> Is this new hardware for you? Or you have it for some time, and it was
> working fine so far, and broke with a kernel or OS update?

Yes, this is new hardware. I bought it a few weeks before starting this
ticket. So I can't tell, if it was working before.

(In reply to Jarkko Nikula from comment #23)
> I found some datasheet through Avoton C2750
> http://ark.intel.com/products/77987/Intel-Atom-Processor-C2750-4M-Cache-2_40-
> GHz
> ->
> https://www-ssl.intel.com/content/dam/www/public/us/en/documents/datasheets/
> atom-c2000-microserver-datasheet.pdf
> 
> I guess both C2758 and C2750 are compatible as they are listed in C2000
> Product Family for Communications.

C2750 is with turbo boost, C2758 has instead of turbo boost a
quickassist accelerator. (Don't know, if this makes a difference for the
register)

Reply at: https://bugs.launchpad.net/ubuntu/+source/linux-
hwe-5.11/+bug/1931001/comments/24

------------------------------------------------------------------------
On 2016-11-29T18:58:21+00:00 jdelvare wrote:

Jarkko, I found the same document, however it doesn't appear to contain
register definitions, or I am blind.

Reply at: https://bugs.launchpad.net/ubuntu/+source/linux-
hwe-5.11/+bug/1931001/comments/25

------------------------------------------------------------------------
On 2016-11-29T22:32:33+00:00 ck+kernelbugzilla wrote:

(In reply to Jean Delvare from comment #25)
> Jarkko, I found the same document, however it doesn't appear to contain
> register definitions, or I am blind.

Maybe chapter 15.8 and 18.5? Sorry, if that's wrong, as I don't know, if
that's, what you are searching?

Reply at: https://bugs.launchpad.net/ubuntu/+source/linux-
hwe-5.11/+bug/1931001/comments/26

------------------------------------------------------------------------
On 2016-11-29T23:27:16+00:00 linux wrote:

Problem is that only the register addresses are provided, not the
register definitions. Sure, there is a status register, and we know its
address, but we don't know how the bits are defined and if they are
defined exactly like in other Intel CPUs.

With the C2000 being a different micro-architecture than the "mainline"
Intel CPUs, there is a real possibility that the register definitions
are different.

Reply at: https://bugs.launchpad.net/ubuntu/+source/linux-
hwe-5.11/+bug/1931001/comments/27

------------------------------------------------------------------------
On 2016-11-30T07:36:06+00:00 jarkko.nikula wrote:

Sorry, I looked at it too quickly. Indeed definitions are missing. I'll
ask http://ark.intel.com/ is there more detailed datasheet available.

Reply at: https://bugs.launchpad.net/ubuntu/+source/linux-
hwe-5.11/+bug/1931001/comments/28

------------------------------------------------------------------------
On 2016-11-30T08:07:57+00:00 jdelvare wrote:

Conrad, until we sort it out, you may be able to work around the problem
by passing option disable_features=0x10 to the i2c-i801 driver.

Reply at: https://bugs.launchpad.net/ubuntu/+source/linux-
hwe-5.11/+bug/1931001/comments/29

------------------------------------------------------------------------
On 2016-11-30T08:53:50+00:00 ck+kernelbugzilla wrote:

(In reply to Jean Delvare from comment #29)
> Conrad, until we sort it out, you may be able to work around the problem by
> passing option disable_features=0x10 to the i2c-i801 driver.

Hey Jean,
seems to help as a workaround after disabling the interrupts for i2c-i801.

[    7.950079] i801_smbus 0000:00:1f.3: Interrupt disabled by user
[    7.951624] i801_smbus 0000:00:1f.3: enabling device (0140 -> 0143)
[    7.953270] i801_smbus 0000:00:1f.3: SMBus using polling

Cheers
Conrad

Reply at: https://bugs.launchpad.net/ubuntu/+source/linux-
hwe-5.11/+bug/1931001/comments/30

------------------------------------------------------------------------
On 2017-03-23T17:12:02+00:00 ck+kernelbugzilla wrote:

*** Bug 177291 has been marked as a duplicate of this bug. ***

Reply at: https://bugs.launchpad.net/ubuntu/+source/linux-
hwe-5.11/+bug/1931001/comments/31

------------------------------------------------------------------------
On 2017-03-23T17:13:48+00:00 ck+kernelbugzilla wrote:

Any news for me? :)

Reply at: https://bugs.launchpad.net/ubuntu/+source/linux-
hwe-5.11/+bug/1931001/comments/32

------------------------------------------------------------------------
On 2017-03-28T09:37:42+00:00 jdelvare wrote:

Jarkko, were you able to get your hands on a datasheet? It doesn't need
to be public, if you can check the register definitions for us.

Reply at: https://bugs.launchpad.net/ubuntu/+source/linux-
hwe-5.11/+bug/1931001/comments/33

------------------------------------------------------------------------
On 2017-03-28T10:29:48+00:00 jarkko.nikula wrote:

I got one contact info back in December but no response. Maybe busy
before holidays and I forgot to ping again. I'll ask again.

Reply at: https://bugs.launchpad.net/ubuntu/+source/linux-
hwe-5.11/+bug/1931001/comments/34

------------------------------------------------------------------------
On 2017-05-07T22:08:40+00:00 ck+kernelbugzilla wrote:

(In reply to Jarkko Nikula from comment #34)
> I got one contact info back in December but no response. Maybe busy before
> holidays and I forgot to ping again. I'll ask again.

Did you got any reply?

Reply at: https://bugs.launchpad.net/ubuntu/+source/linux-
hwe-5.11/+bug/1931001/comments/35

------------------------------------------------------------------------
On 2017-05-08T08:20:35+00:00 jarkko.nikula wrote:

Just only out of office reply back in March but pinged again now.

Reply at: https://bugs.launchpad.net/ubuntu/+source/linux-
hwe-5.11/+bug/1931001/comments/36

------------------------------------------------------------------------
On 2017-06-10T14:47:28+00:00 ck+kernelbugzilla wrote:

(In reply to Jarkko Nikula from comment #36)
> Just only out of office reply back in March but pinged again now.

And now? ;-)

Reply at: https://bugs.launchpad.net/ubuntu/+source/linux-
hwe-5.11/+bug/1931001/comments/37

------------------------------------------------------------------------
On 2021-02-26T14:41:32+00:00 andy.shevchenko wrote:

Hmm... Seems this one gets somehow abandoned. Jarkko, any news on this?
Same question to Conrad, do you have any luck with v5.11 based kernels
(or closer to latest)?

Reply at: https://bugs.launchpad.net/ubuntu/+source/linux-
hwe-5.11/+bug/1931001/comments/38

------------------------------------------------------------------------
On 2021-02-26T16:36:38+00:00 ck+kernelbugzilla wrote:

(In reply to Andy Shevchenko from comment #38)
> Hmm... Seems this one gets somehow abandoned. Jarkko, any news on this? Same
> question to Conrad, do you have any luck with v5.11 based kernels (or closer
> to latest)?

Nope. No news. Problem still exists with latest kernel.

Reply at: https://bugs.launchpad.net/ubuntu/+source/linux-
hwe-5.11/+bug/1931001/comments/39

------------------------------------------------------------------------
On 2021-03-01T13:16:37+00:00 jarkko.nikula wrote:

Unfortunately I don't have any updates on this.

Reply at: https://bugs.launchpad.net/ubuntu/+source/linux-
hwe-5.11/+bug/1931001/comments/40

------------------------------------------------------------------------
On 2021-06-08T08:32:57+00:00 andy.shevchenko wrote:

This bug gives me an idea to try MSI on i801, but it appears that there
is none of the platforms that have MSI capability on this device. Not
sure if it's usable information, but I think it's better to share it
anyway.

Reply at: https://bugs.launchpad.net/ubuntu/+source/linux-
hwe-5.11/+bug/1931001/comments/46

------------------------------------------------------------------------
On 2021-10-09T11:22:50+00:00 stephane.poignant wrote:

Not sure that's completely related, but would assume at least partially.
I have two mini-servers, one with a Supermicro A2SDi-8C-HLN4F (Atom C3758), and 
the other one with an older Supermicro A1SRM-2758F (Atom C2758F).

I upgraded both from Debian Buster (kernel 4.19.194-3) to Bullseye
(5.10.46-5). No issue on the C3758, but i was faced with severe
performance regression on the C2758F.

When running 5.10 on the C2758F, /proc/interrupts shows about 100k
interrupts per second for 'IO-APIC 18-fasteoi i801_smbus', and overall
performance suffers a lot (e.g. iperf between two KVM virtual machines
bridged together is 93% slower with 5.10 than with 4.19).

So far i was getting around the issue by blocklisting i2c_i801. After i
found this, i tried adding the disable_features=0x10 option, and that
worked too.

I'm not using jc42 at all, sensors thresholds are set to correct values
by the distro tools.

# i2cdetect -l

# sensors
nvme-pci-0400
Adapter: PCI adapter
Composite:    +30.9°C  (low  = -273.1°C, high = +84.8°C)
                       (crit = +84.8°C)
Sensor 1:     +30.9°C  (low  = -273.1°C, high = +65261.8°C)
Sensor 2:     +31.9°C  (low  = -273.1°C, high = +65261.8°C)

coretemp-isa-0000
Adapter: ISA adapter
Core 0:       +48.0°C  (high = +98.0°C, crit = +98.0°C)
Core 1:       +48.0°C  (high = +98.0°C, crit = +98.0°C)
Core 2:       +48.0°C  (high = +98.0°C, crit = +98.0°C)
Core 3:       +48.0°C  (high = +98.0°C, crit = +98.0°C)
Core 4:       +47.0°C  (high = +98.0°C, crit = +98.0°C)
Core 5:       +46.0°C  (high = +98.0°C, crit = +98.0°C)
Core 6:       +47.0°C  (high = +98.0°C, crit = +98.0°C)
Core 7:       +47.0°C  (high = +98.0°C, crit = +98.0°C)

# dmesg | egrep -i '(smbus|i801)'
[    2.226240] ismt_smbus 0000:00:13.0: enabling device (0000 -> 0002)
[    2.229927] i801_smbus 0000:00:1f.3: enabling device (0000 -> 0003)
[    2.230089] i801_smbus 0000:00:1f.3: SPD Write Disable is set
[    2.230136] i801_smbus 0000:00:1f.3: SMBus using PCI interrupt

~# lspci -nn | grep SMBus
00:13.0 System peripheral [0880]: Intel Corporation Atom processor C2000 SMBus 
2.0 [8086:1f15] (rev 03)
00:1f.3 SMBus [0c05]: Intel Corporation Atom processor C2000 PCU SMBus 
[8086:1f3c] (rev 03)

# lspci -xxx -s 00:13.0
00:13.0 System peripheral: Intel Corporation Atom processor C2000 SMBus 2.0 
(rev 03)
00: 86 80 15 1f 06 04 10 00 03 00 80 08 00 00 00 00
10: 04 70 31 df 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 d9 15 20 08
30: 00 00 00 00 40 00 00 00 00 00 00 00 ff 01 00 00
40: 10 80 92 00 01 80 00 10 20 08 04 00 00 00 00 00
50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
60: 00 00 00 00 10 00 00 00 00 00 00 00 00 00 00 00
70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
80: 01 8c 03 00 00 00 00 00 00 00 00 00 05 00 81 01
90: 04 00 e4 fe 00 00 00 00 21 40 00 00 00 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 01 00 10 00 10 80
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

# lspci -xxx -s 00:1f.3
00:1f.3 SMBus: Intel Corporation Atom processor C2000 PCU SMBus (rev 03)
00: 86 80 3c 1f 03 00 98 02 03 00 05 0c 00 00 00 00
10: 00 40 31 df 00 00 00 00 00 00 00 00 00 00 00 00
20: 01 e0 00 00 00 00 00 00 00 00 00 00 d9 15 20 08
30: 00 00 00 00 00 00 00 00 00 00 00 00 ff 02 00 00
40: 11 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
60: 03 04 04 00 00 00 08 08 00 00 00 00 00 00 00 00
70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
80: 04 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 0f 02 01 03 03 03 00

Reply at: https://bugs.launchpad.net/ubuntu/+source/linux-
hwe-5.11/+bug/1931001/comments/54

------------------------------------------------------------------------
On 2021-10-09T13:10:35+00:00 ck+kernelbugzilla wrote:

Yes, this is the same problem here. But Intel doesn't seem to be
interessted here :-(

Reply at: https://bugs.launchpad.net/ubuntu/+source/linux-
hwe-5.11/+bug/1931001/comments/55

------------------------------------------------------------------------
On 2021-10-11T13:07:56+00:00 jarkko.nikula wrote:

(In reply to stephane.poignant from comment #42)
> I upgraded both from Debian Buster (kernel 4.19.194-3) to Bullseye
> (5.10.46-5). No issue on the C3758, but i was faced with severe performance
> regression on the C2758F.
> 
Interesting, so was the 4.19 working on the C2758F without interrupt storm?

Reply at: https://bugs.launchpad.net/ubuntu/+source/linux-
hwe-5.11/+bug/1931001/comments/57

------------------------------------------------------------------------
On 2021-10-11T19:37:23+00:00 stephane.poignant wrote:

(In reply to Jarkko Nikula from comment #44)
> (In reply to stephane.poignant from comment #42)
> > I upgraded both from Debian Buster (kernel 4.19.194-3) to Bullseye
> > (5.10.46-5). No issue on the C3758, but i was faced with severe performance
> > regression on the C2758F.
> > 
> Interesting, so was the 4.19 working on the C2758F without interrupt storm?

I haven't checked the /proc/interrupts when running 4.19 so i cannot
tell for sure that the interrupts were not there. The performance
regression was not there for sure. I can check this in a couple of weeks
(server at a remote location with no oobm network).

Dmesg when running 4.19 shows it had interrupts enabled:

[    0.000000] Linux version 4.19.0-17-amd64 (debian-ker...@lists.debian.org) 
(gcc version 8.3.0 (Debian 8.3.0-6)) #1 SMP Debian 4.19.194-3 (2021-07-18)
[    0.000000] Command line: BOOT_IMAGE=/vmlinuz-4.19.0-17-amd64 
root=/dev/mapper/vg1--hrbpsrv01-h--hrbpsrv01 ro quiet rd.luks.options=discard
...
[    1.434097] Run /init as init process
[    1.782787] dca service started, version 1.12.1
[    1.783203] ismt_smbus 0000:00:13.0: enabling device (0000 -> 0002)
[    1.796694] cryptd: max_cpu_qlen set to 1000
[    1.801177] i801_smbus 0000:00:1f.3: enabling device (0000 -> 0003)
[    1.801317] i801_smbus 0000:00:1f.3: SPD Write Disable is set
[    1.801356] i801_smbus 0000:00:1f.3: SMBus using PCI interrupt
[    1.805199] igb: Intel(R) Gigabit Ethernet Network Driver - version 5.4.0-k
[    1.805202] igb: Copyright (c) 2007-2014 Intel Corporation.
[    1.805246] igb 0000:00:14.0: enabling device (0000 -> 0002)
[    1.816722] SSE version of gcm_enc/dec engaged.
...

Reply at: https://bugs.launchpad.net/ubuntu/+source/linux-
hwe-5.11/+bug/1931001/comments/58

------------------------------------------------------------------------
On 2021-10-11T20:26:40+00:00 ck+kernelbugzilla wrote:

The problem do persists in kernel 4.19 and other versions. It only
depens, if a different driver triggers the interrupts. If so, they are
counting very high. So it's possible, that you had none driver in 4.19
using those interrupts and as a consequence, the bug did not trigger.

@Jarkko Nikula: Since you are still replying, could you please try again
and further to get the needed docs, as requested by Jean Delvare?

Reply at: https://bugs.launchpad.net/ubuntu/+source/linux-
hwe-5.11/+bug/1931001/comments/59

------------------------------------------------------------------------
On 2021-10-13T11:37:43+00:00 jarkko.nikula wrote:

@Conrad Kostecki: Yeah, I agree with you it's unlikely problem was
absent in 4.19 as it was present way before it.

I was in contact with our sales support and they told the Atom C2758
with F-postfix is custom to SuperMicro. Unfortunately they didn't find
explicit specification for the SMBus controller on it but they told it's
based on the same 22 nm Silvermonth architecture than the Bay Trail. I
suppose SMBus IO should be compatible.

Unfortunately public datasheets for Bay Trails seems scarce too but I
was able to find something when searching datasheets for the Bay Trail
E3825 used in MinnowBoard Max. Following document seems to be available
for the registered ark.intel.com user or by search engines:

"Intel Atom ® Processor E3800 Product Family" with Document Number:
538136 and Chapter 33 "PCU – System Management Bus (SMBus)"

Reply at: https://bugs.launchpad.net/ubuntu/+source/linux-
hwe-5.11/+bug/1931001/comments/60

------------------------------------------------------------------------
On 2021-10-13T11:39:05+00:00 jarkko.nikula wrote:

Created attachment 299193
Debug patch for the i2c-i801 interrupts

Reply at: https://bugs.launchpad.net/ubuntu/+source/linux-
hwe-5.11/+bug/1931001/comments/61

------------------------------------------------------------------------
On 2021-10-13T11:44:31+00:00 jarkko.nikula wrote:

Could you try attached patch what interrupt statuses it will print in
case of interrupt storm? It's rate limited debug print so it shouldn't
flood the dmesg.

You need to have CONFIG_DYNAMIC_DEBUG=y in your kernel config and either
enable the debug print in runtime by following:

mount none /sys/kernel/debug -t debugfs
echo -n "func i801_isr +p" >/sys/kernel/debug/dynamic_debug/control

or by appending that to your kernel command line:
i2c_i801.dyndbg="func i801_isr +p"

Reply at: https://bugs.launchpad.net/ubuntu/+source/linux-
hwe-5.11/+bug/1931001/comments/62

------------------------------------------------------------------------
On 2021-10-13T22:18:43+00:00 ck+kernelbugzilla wrote:

Here is the output:

pcicst 0x298, SMBHSTSTS 0x60
[  359.205884] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  359.205918] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  364.210031] i801_isr: 375367 callbacks suppressed
[  364.210043] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  364.210085] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  364.210126] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  364.210142] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  364.210178] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  364.210217] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  364.210234] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  364.210253] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  364.210292] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  364.210329] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  369.220035] i801_isr: 380909 callbacks suppressed
[  369.220047] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  369.220069] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  369.220109] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  369.220146] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  369.220185] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  369.220222] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  369.220262] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  369.220278] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  369.220317] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  369.220333] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  374.230078] i801_isr: 393736 callbacks suppressed
[  374.230109] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  374.230151] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  374.230191] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  374.230210] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  374.230248] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  374.230283] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  374.230297] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  374.230332] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  374.230345] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  374.230358] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  379.240037] i801_isr: 382705 callbacks suppressed
[  379.240068] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  379.240090] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  379.240110] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  379.240130] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  379.240150] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  379.240186] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  379.240205] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  379.240242] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  379.240281] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  379.240297] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  384.250032] i801_isr: 387109 callbacks suppressed
[  384.250043] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  384.250065] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  384.250104] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  384.250141] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  384.250181] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  384.250197] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  384.250216] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  384.250255] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  384.250292] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  384.250311] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60

$ cat /proc/interrupts
           CPU0       CPU1       CPU2       CPU3       CPU4       CPU5       
CPU6       CPU7
 18:          0          0          0   26596692          0          0          
0          0   IO-APIC  18-fasteoi   i801_smbus

Reply at: https://bugs.launchpad.net/ubuntu/+source/linux-
hwe-5.11/+bug/1931001/comments/63

------------------------------------------------------------------------
On 2021-10-14T10:58:00+00:00 jarkko.nikula wrote:

Thanks. Those debug prints confirm the interrupt is really coming from
the SMBus controller (bit 3 is set in PCI status) and the SMB alert bit
is set.

Reply at: https://bugs.launchpad.net/ubuntu/+source/linux-
hwe-5.11/+bug/1931001/comments/64

------------------------------------------------------------------------
On 2021-10-14T10:58:55+00:00 jarkko.nikula wrote:

Created attachment 299201
Experimental patch disabling SMB_ALERT signal

Reply at: https://bugs.launchpad.net/ubuntu/+source/linux-
hwe-5.11/+bug/1931001/comments/65

------------------------------------------------------------------------
On 2021-10-14T11:03:47+00:00 jarkko.nikula wrote:

@Conrad Kostecki: Could you try does the attached experimental patch
which disables the SMB_ALERT help here.

Reply at: https://bugs.launchpad.net/ubuntu/+source/linux-
hwe-5.11/+bug/1931001/comments/66

------------------------------------------------------------------------
On 2021-10-14T20:10:57+00:00 stephane.poignant wrote:

Thanks for the follow up, i will test the patch on my setup as well by
next week.

Reply at: https://bugs.launchpad.net/ubuntu/+source/linux-
hwe-5.11/+bug/1931001/comments/67

------------------------------------------------------------------------
On 2021-10-14T20:53:07+00:00 ck+kernelbugzilla wrote:

I just tested the patch and can confirm, it works. After applying patch,
interrupts dropped nearly to zero on i801_smbus.

Reply at: https://bugs.launchpad.net/ubuntu/+source/linux-
hwe-5.11/+bug/1931001/comments/68

------------------------------------------------------------------------
On 2021-10-14T21:00:04+00:00 andy.shevchenko wrote:

(In reply to Conrad Kostecki from comment #55)
> I just tested the patch and can confirm, it works. After applying patch,
> interrupts dropped nearly to zero on i801_smbus.

According to the specification the host (if implemented ALERT) should issue 
special byte read command to see which device wants to send something. If the 
proper implementation won't fix this, it might be some pin configuration issue 
(like pull down sitting on the respective pin) or PCB or firmware (BIOS) issues.
Would be nice to understand, if it can be done without much efforts, what's 
exactly is making the ALERT be asserted.

Reply at: https://bugs.launchpad.net/ubuntu/+source/linux-
hwe-5.11/+bug/1931001/comments/69

------------------------------------------------------------------------
On 2021-10-15T08:04:32+00:00 jarkko.nikula wrote:

I was thinking too should there be proper acknowledging for the
SMB_ALERT but since the driver currently doesn't have support for it I
wanted to see does simple disabling help.

Fortunately I was able to reproduce issue locally in an another platform
where the SMB_ALERT was connected to a resistor and was able to pull-
down the signal by a wire. Interrupt storm begins when the SMB_ALERT
goes down for a moment and continues after.

I'll test a bit more and make a proper patch. One thing I'm wondering
should the driver restore the original disable status on driver removal
like what is done for host notify in i801_disable_host_notify().

Reply at: https://bugs.launchpad.net/ubuntu/+source/linux-
hwe-5.11/+bug/1931001/comments/70

------------------------------------------------------------------------
On 2021-10-15T14:12:40+00:00 jarkko.nikula wrote:

Created attachment 299217
2nd version of patch disabling SMB_ALERT signal

I moved the SMB_ALERT signal disabling to i801_enable_host_notify()
since the SMBSLVCMD register is available on ICH3 and later. Also it
keeps the original value prior to driver load.

Reply at: https://bugs.launchpad.net/ubuntu/+source/linux-
hwe-5.11/+bug/1931001/comments/71

------------------------------------------------------------------------
On 2021-10-15T14:27:07+00:00 andy.shevchenko wrote:

(In reply to Jarkko Nikula from comment #58)
> 2nd version of patch disabling SMB_ALERT signal

Side remark: Looking into this code, shouldn't you first clean current
notifications and only after that enable IRQ?

Reply at: https://bugs.launchpad.net/ubuntu/+source/linux-
hwe-5.11/+bug/1931001/comments/72

------------------------------------------------------------------------
On 2021-10-15T22:39:15+00:00 ck+kernelbugzilla wrote:

Patch v2 works for me. Interrupts still are fine and do not go crazy.

Reply at: https://bugs.launchpad.net/ubuntu/+source/linux-
hwe-5.11/+bug/1931001/comments/73

------------------------------------------------------------------------
On 2021-10-16T00:41:12+00:00 stephane.poignant wrote:

I can confirm that i am getting the same results with the two patches on my 
setup with the Debian kernels.
Debug patch produces the same messages, and with SMB_ALERT disable patch there 
was no longer any interrupt triggered.

Also when booting into the previous kernel i was using (linux-
image-4.19.0-17-amd64 4.19.194-3), the module loads with the default
config but i am not getting any interrupt. So for my particular setup
the issue only appeared after upgrading from Debian kernel 4.19 to 5.10.

Will test the second version of the patch ASAP and provide you with the
results.


## Kernel 4.16

# uname -a
Linux hrbpsrv01.intra.lan 4.19.0-17-amd64 #1 SMP Debian 4.19.194-3 (2021-07-18) 
x86_64 GNU/Linux

# cat /proc/interrupts | grep i801
 18:          0          0          0          0          0          0          
0          0   IO-APIC  18-fasteoi   i801_smbus

# dmesg
...
[ 6652.023634] i801_smbus 0000:00:1f.3: SPD Write Disable is set
[ 6652.023689] i801_smbus 0000:00:1f.3: SMBus using PCI interrupt
...


## Debian linux-image-5.10.0-9-amd64 (5.10.70-1) + Debug patch

# uname -a
Linux hrbpsrv01.intra.lan 5.10.0-9-amd64 #1 SMP Debian 5.10.70-1 (2021-09-30) 
x86_64 GNU/Linux

# cat /proc/interrupts | grep i801
 18:          0          0          0          0          0    7358862          
0          0   IO-APIC  18-fasteoi   i801_smbus
(increase at about 100k interrupts/sec)

# dmesg
...
[  516.429120] i801_smbus 0000:00:1f.3: SPD Write Disable is set
[  516.429140] i801_smbus 0000:00:1f.3: An interrupt is pending!
[  516.429161] i801_smbus 0000:00:1f.3: SMBus using PCI interrupt
[  516.429933] i2c i2c-1: 4/4 memory slots populated (from DMI)
[  516.430337] at24 1-0050: supply vcc not found, using dummy regulator
[  516.431043] at24 1-0050: 256 byte spd EEPROM, read-only
[  516.431078] i2c i2c-1: Successfully instantiated SPD at 0x50
[  516.431455] at24 1-0051: supply vcc not found, using dummy regulator
[  516.432148] at24 1-0051: 256 byte spd EEPROM, read-only
[  516.432174] i2c i2c-1: Successfully instantiated SPD at 0x51
[  516.432576] at24 1-0052: supply vcc not found, using dummy regulator
[  516.433284] at24 1-0052: 256 byte spd EEPROM, read-only
[  516.433325] i2c i2c-1: Successfully instantiated SPD at 0x52
[  516.433748] at24 1-0053: supply vcc not found, using dummy regulator
[  516.434454] at24 1-0053: 256 byte spd EEPROM, read-only
[  516.434497] i2c i2c-1: Successfully instantiated SPD at 0x53
[  525.513104] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  525.513133] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  525.513161] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  525.513185] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  525.513209] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  525.513234] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  525.513258] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  525.513281] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  525.513316] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  525.513352] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  530.514207] i801_isr: 297603 callbacks suppressed
[  530.514221] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  530.514259] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  530.514299] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  530.514331] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  530.514366] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  530.514391] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  530.514425] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  530.514457] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  530.514482] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  530.514507] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  535.518261] i801_isr: 320308 callbacks suppressed
[  535.518273] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  535.518311] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  535.518337] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  535.518362] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  535.518386] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  535.518415] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  535.518442] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  535.518467] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  535.518491] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
[  535.518516] i801_smbus 0000:00:1f.3: pcicst 0x298, SMBHSTSTS 0x60
...


## Kernel 5.10 + Disable ALRM interrupt patch

# cat /proc/interrupts | grep i801
 18:          0          0          0          0          0   10567596          
0          0   IO-APIC  18-fasteoi   i801_smbus
(no longer increase)

# dmesg
...
[  664.110013] i801_smbus 0000:00:1f.3: SPD Write Disable is set
[  664.110065] i801_smbus 0000:00:1f.3: SMBus using PCI interrupt
[  664.111975] i2c i2c-1: 4/4 memory slots populated (from DMI)
[  664.112460] at24 1-0050: supply vcc not found, using dummy regulator
[  664.113195] at24 1-0050: 256 byte spd EEPROM, read-only
[  664.113240] i2c i2c-1: Successfully instantiated SPD at 0x50
[  664.113657] at24 1-0051: supply vcc not found, using dummy regulator
[  664.114374] at24 1-0051: 256 byte spd EEPROM, read-only
[  664.114412] i2c i2c-1: Successfully instantiated SPD at 0x51
[  664.114823] at24 1-0052: supply vcc not found, using dummy regulator
[  664.116794] at24 1-0052: 256 byte spd EEPROM, read-only
[  664.116838] i2c i2c-1: Successfully instantiated SPD at 0x52
[  664.117288] at24 1-0053: supply vcc not found, using dummy regulator
[  664.118042] at24 1-0053: 256 byte spd EEPROM, read-only
[  664.118092] i2c i2c-1: Successfully instantiated SPD at 0x53

Reply at: https://bugs.launchpad.net/ubuntu/+source/linux-
hwe-5.11/+bug/1931001/comments/74

------------------------------------------------------------------------
On 2021-10-16T15:20:42+00:00 stephane.poignant wrote:

Patch V2 works for me too.

# cat /proc/interrupts | grep i801
 18:          0          0          0          0          0          8          
0          0   IO-APIC  18-fasteoi   i801_smbus

Reply at: https://bugs.launchpad.net/ubuntu/+source/linux-
hwe-5.11/+bug/1931001/comments/75

------------------------------------------------------------------------
On 2021-10-18T14:07:09+00:00 jarkko.nikula wrote:

(In reply to Andy Shevchenko from comment #59)
> (In reply to Jarkko Nikula from comment #58)
> > 2nd version of patch disabling SMB_ALERT signal
> 
> Side remark: Looking into this code, shouldn't you first clean current
> notifications and only after that enable IRQ?

That's a good question and made me debugging more. In fact disabling
doesn't disable detection and SMBALERT_STS will be set and cause short
burst of interrupts during driver load and unload time if SMB_ALERT
signal was asserted. Looks like it's better to add basic acknowledging
for it into i801_isr().

I'm not sure would clearing pending interrupts at the probe time cause
any regression but acknowledging the SMBALERT_STS in i801_isr() makes
sure the status doesn't stay forever if it occurs after probe.

Reply at: https://bugs.launchpad.net/ubuntu/+source/linux-
hwe-5.11/+bug/1931001/comments/76

------------------------------------------------------------------------
On 2021-10-18T15:18:18+00:00 andy.shevchenko wrote:

(In reply to Jarkko Nikula from comment #63)
> (In reply to Andy Shevchenko from comment #59)
> > (In reply to Jarkko Nikula from comment #58)
> > > 2nd version of patch disabling SMB_ALERT signal
> > 
> > Side remark: Looking into this code, shouldn't you first clean current
> > notifications and only after that enable IRQ?
> 
> That's a good question and made me debugging more. In fact disabling doesn't
> disable detection and SMBALERT_STS will be set and cause short burst of
> interrupts during driver load and unload time if SMB_ALERT signal was
> asserted. Looks like it's better to add basic acknowledging for it into
> i801_isr().
> 
> I'm not sure would clearing pending interrupts at the probe time cause any
> regression but acknowledging the SMBALERT_STS in i801_isr() makes sure the
> status doesn't stay forever if it occurs after probe.

It also makes sense to test it with DEBUG_SHIRQ enabled (yes, I know
that more than a half of the drivers in the Linux kernel will either
crash or behave badly on this, not many developers know about the
debugging feature).

Reply at: https://bugs.launchpad.net/ubuntu/+source/linux-
hwe-5.11/+bug/1931001/comments/77


** Changed in: linux
       Status: Unknown => Incomplete

** Changed in: linux
   Importance: Unknown => Medium

** Bug watch added: Linux Kernel Bug Tracker #177291
   https://bugzilla.kernel.org/show_bug.cgi?id=177291

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-hwe-5.11 in Ubuntu.
https://bugs.launchpad.net/bugs/1931001

Title:
  kernel: watchdog: BUG: soft lockup - CPU#3 stuck for 22s!

Status in Linux:
  Incomplete
Status in linux-hwe-5.11 package in Ubuntu:
  Confirmed
Status in Fedora:
  Confirmed

Bug description:
  Ubuntu 20.04 LTS and Ubuntu 21.04 occasionally boots with very bad
  performance and very unresponsive to user input on Lenovo laptop
  Lenovo 300e 2nd Gen 81M9 (LENOVO_MT_81M9_BU_idea_FM_300e 2nd G).

  When this happens you can read this kind of messages on journal:

  ---
  root@alumne-1-58:~# journalctl | grep "BUG: soft"
  may 20 21:44:35 alumne-1-58 kernel: watchdog: BUG: soft lockup - CPU#3 stuck 
for 22s! [swapper/3:0]
  may 20 21:44:35 alumne-1-58 kernel: watchdog: BUG: soft lockup - CPU#3 stuck 
for 22s! [swapper/3:0]
  may 22 09:33:34 alumne-1-58 kernel: watchdog: BUG: soft lockup - CPU#0 stuck 
for 22s! [swapper/0:0]
  may 24 16:45:14 alumne-1-58 kernel: watchdog: BUG: soft lockup - CPU#1 stuck 
for 23s! [prometheus-node:4220]
  may 24 16:45:14 alumne-1-58 kernel: watchdog: BUG: soft lockup - CPU#0 stuck 
for 23s! [swapper/0:0]
  jun 03 00:01:09 alumne-1-58 kernel: watchdog: BUG: soft lockup - CPU#0 stuck 
for 22s! [swapper/0:0]
  jun 03 00:01:09 alumne-1-58 kernel: watchdog: BUG: soft lockup - CPU#0 stuck 
for 23s! [swapper/0:0]
  jun 03 00:01:09 alumne-1-58 kernel: watchdog: BUG: soft lockup - CPU#1 stuck 
for 22s! [swapper/1:0]
  jun 03 00:01:09 alumne-1-58 kernel: watchdog: BUG: soft lockup - CPU#0 stuck 
for 22s! [swapper/0:0]
  jun 03 00:02:15 alumne-1-58 kernel: watchdog: BUG: soft lockup - CPU#0 stuck 
for 21s! [swapper/0:0]
  jun 05 08:22:58 alumne-1-58 kernel: watchdog: BUG: soft lockup - CPU#3 stuck 
for 22s! [irq/138-iwlwifi:1044]
  jun 05 08:25:06 alumne-1-58 kernel: watchdog: BUG: soft lockup - CPU#2 stuck 
for 22s! [swapper/2:0]
  jun 05 08:25:06 alumne-1-58 kernel: watchdog: BUG: soft lockup - CPU#3 stuck 
for 22s! [irq/138-iwlwifi:1044]
  jun 05 08:26:42 alumne-1-58 kernel: watchdog: BUG: soft lockup - CPU#1 stuck 
for 23s! [lxd:3975]
  jun 05 08:26:42 alumne-1-58 kernel: watchdog: BUG: soft lockup - CPU#2 stuck 
for 23s! [swapper/2:0]
  jun 05 08:26:42 alumne-1-58 kernel: watchdog: BUG: soft lockup - CPU#3 stuck 
for 23s! [irq/138-iwlwifi:1044]
  jun 05 08:27:38 alumne-1-58 kernel: watchdog: BUG: soft lockup - CPU#3 stuck 
for 23s! [irq/138-iwlwifi:1044]
  jun 05 08:28:34 alumne-1-58 kernel: watchdog: BUG: soft lockup - CPU#3 stuck 
for 22s! [irq/138-iwlwifi:1044]
  jun 05 08:29:46 alumne-1-58 kernel: watchdog: BUG: soft lockup - CPU#3 stuck 
for 22s! [irq/138-iwlwifi:1044]
  root@alumne-1-58:~#
  ---

  Usually if you reboot everything works fine but it's very annoying
  when happens.

To manage notifications about this bug go to:
https://bugs.launchpad.net/linux/+bug/1931001/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to