Hi,
在 2025/05/06 4:59, Antoine Beaupré 写道:
On 2025-05-05 22:36:07, Salvatore Bonaccorso wrote:
Hi Antoine,
On Mon, May 05, 2025 at 02:50:32PM -0400, Antoine Beaupré wrote:
On 2025-05-05 18:02:37, Salvatore Bonaccorso wrote:
On Mon, May 05, 2025 at 04:00:31PM +0200, Salvatore Bonaccorso wrote:
Hi Moritz,
On Mon, May 05, 2025 at 01:47:15PM +0200, Moritz Mühlenhoff wrote:
Am Wed, Apr 30, 2025 at 05:55:20PM +0200 schrieb Salvatore Bonaccorso:
Hi
We got a regression report in Debian after the update from 6.1.133 to
6.1.135. Melvin is reporting that discard/trimm trhough a RAID10 array
stalls idefintively. The full report is inlined below and originates
from https://bugs.debian.org/1104460 .
JFTR, we ran into the same problem with a few Wikimedia servers running
6.1.135 and RAID 10: The servers started to lock up once fstrim.service
got started. Full oops messages are available at
https://phabricator.wikimedia.org/P75746
Thanks for this aditional datapoints. Assuming you wont be able to
thest the other stable series where the commit d05af90d6218
("md/raid10: fix missing discard IO accounting") went in, might you at
least be able to test the 6.1.y branch with the commit reverted again
and manually trigger the issue?
If needed I can provide a test Debian package of 6.1.135 (or 6.1.137)
with the patch reverted.
So one additional data point as several Debian users were reporting
back beeing affected: One user did upgrade to 6.12.25 (where the
commit was backported as well) and is not able to reproduce the issue
there.
That would be me.
I can reproduce the issue as outlined by Moritz above fairly reliably in
6.1.135 (debian package 6.1.0-34-amd64). The reproducer is simple, on a
RAID-10 host:
1. reboot
2. systemctl start fstrim.service
We're tracking the issue internally in:
https://gitlab.torproject.org/tpo/tpa/team/-/issues/42146
I've managed to workaround the issue by upgrading to the Debian package
from testing/unstable (6.12.25), as Salvatore indicated above. There,
fstrim doesn't cause any crash and completes successfully. In stable, it
just hangs there forever. The kernel doesn't completely panic and the
machine is otherwise somewhat still functional: my existing SSH
connection keeps working, for example, but new ones fail. And an `apt
install` of another kernel hangs forever.
So likely at least in 6.1.y there are missing pre-requisites causing
the behaviour.
If you can test 6.1.135-1 with the commit
4a05f7ae33716d996c5ce56478a36a3ede1d76f2 reverted then you can fetch
built packages at:
https://people.debian.org/~carnil/tmp/linux/1104460/
Can you also test with 4a05f7ae33716d996c5ce56478a36a3ede1d76f2 not
reverted, and also cherry-pick c567c86b90d4715081adfe5eb812141a5b6b4883?
Thanks,
Kuai
I can confirm this kernel does not crash when running fstrim.service,
which seems to confirm the bisect.
A.