Hi,

On 05.06.25 05:02, 程凌飞 wrote:
Hi, all!

We’ve encountered a kernel crash when running rmmod i2c-pasemi-platform on a 
Mac Mini M1 (T8103) running Asahi Arch Linux.

The bug was first found on the Linux v6.6, which is built manually with the 
Asahi given config to run our services.
At that time, the i2c-pasemi-platform was i2c-apple.

We noticed in the Linux v6.7, the pasemi is splitted into two separate modules, 
one of which is i2c-pasemi-platform.
Therefore, we built Linux v6.14.6 and tried to rmmod i2c-pasemi-platform again, 
the crash still exists. Moreover, we fetched
the latest i2c-pasemi-platform on 
linux-next(911483b25612c8bc32a706ba940738cc43299496) and asahi, built them and
tested again with Linux v6.14.6, but the crash remains.

Because kexec is not supported and will never be fully supported on Apple 
Silicon platforms due to hardware and firmware
design constraints, we can not record the panic logs through kdump.

Do you have UART connected to a device under test which you could use to grab the panic log from the kernel? Alternatively you can also run the kernel under m1n1's hypervisor and grab the log that way. It'll emulate the serial port and redirect its output via USB.


Thus we tried to find the root cause of the issue manually. When we perform 
rmmod, the kernel performs device releasing on
the i2c bus, then calls the remove function in snd-soc-cs42l83-i2c, which calls 
the cs42l42_common_remove in cs42l42,
because cs42l42->init_done is true, it performs regmap_write, and finally 
calls into pasemi_smb_waitready in i2c-pasemi
-core.c. We noticed that smbus->use_irq is true, and after it calls into 
wait_for_completion_timeout, the system crashs!>
We found that wait_for_completion_timeout is one of the core scheduler APIs 
used by tens of thousands of other drivers,
it is unlikely causing the crash. So we tried to remove the call to 
wait_for_completion_timeout, then the system seems to
run well.

However, because we have little knowledge about i2c devices and specifications, 
we are not sure whether this change will
cause other potential harms for the system and device. Is this call to wait 
necesary here? Or can you give a more
sophisticated fix?

Yes, that call is necessary. It waits for the "transfer completed" interrupt from the hardware. Without it the driver will try to read data before it's available and you'll see corruption. I'm surprised hardware attached to i2c (usb pd controller and audio I think) works at all with that change.


Sven


Reply via email to