Hi,
On 05.06.25 05:02, 程凌飞 wrote:
Hi, all!
We’ve encountered a kernel crash when running rmmod i2c-pasemi-platform on a
Mac Mini M1 (T8103) running Asahi Arch Linux.
The bug was first found on the Linux v6.6, which is built manually with the
Asahi given config to run our services.
At that time, the i2c-pasemi-platform was i2c-apple.
We noticed in the Linux v6.7, the pasemi is splitted into two separate modules,
one of which is i2c-pasemi-platform.
Therefore, we built Linux v6.14.6 and tried to rmmod i2c-pasemi-platform again,
the crash still exists. Moreover, we fetched
the latest i2c-pasemi-platform on
linux-next(911483b25612c8bc32a706ba940738cc43299496) and asahi, built them and
tested again with Linux v6.14.6, but the crash remains.
Because kexec is not supported and will never be fully supported on Apple
Silicon platforms due to hardware and firmware
design constraints, we can not record the panic logs through kdump.
Do you have UART connected to a device under test which you could use to
grab the panic log from the kernel? Alternatively you can also run the
kernel under m1n1's hypervisor and grab the log that way. It'll emulate
the serial port and redirect its output via USB.
Thus we tried to find the root cause of the issue manually. When we perform
rmmod, the kernel performs device releasing on
the i2c bus, then calls the remove function in snd-soc-cs42l83-i2c, which calls
the cs42l42_common_remove in cs42l42,
because cs42l42->init_done is true, it performs regmap_write, and finally
calls into pasemi_smb_waitready in i2c-pasemi
-core.c. We noticed that smbus->use_irq is true, and after it calls into
wait_for_completion_timeout, the system crashs!>
We found that wait_for_completion_timeout is one of the core scheduler APIs
used by tens of thousands of other drivers,
it is unlikely causing the crash. So we tried to remove the call to
wait_for_completion_timeout, then the system seems to
run well.
However, because we have little knowledge about i2c devices and specifications,
we are not sure whether this change will
cause other potential harms for the system and device. Is this call to wait
necesary here? Or can you give a more
sophisticated fix?
Yes, that call is necessary. It waits for the "transfer completed"
interrupt from the hardware. Without it the driver will try to read data
before it's available and you'll see corruption. I'm surprised hardware
attached to i2c (usb pd controller and audio I think) works at all with
that change.
Sven