On Wed, Jan 17, 2024 at 10:19 AM Mikulas Patocka <[email protected]> wrote:
>
> stop_sync_thread sets MD_RECOVERY_INTR and then waits for
> MD_RECOVERY_RUNNING to be cleared. However, md_do_sync will not clear
> MD_RECOVERY_RUNNING when exiting, it will set MD_RECOVERY_DONE instead.
>
> So, we must wait for MD_RECOVERY_DONE to be set as well.
>
> This patch fixes a deadlock in the LVM2 test shell/integrity-caching.sh.
I am not able to reproduce the issue on 6.7 kernel with
shell/integrity-caching.sh.
I got:
VERBOSE=0 ./lib/runner \
--testdir . --outdir results \
--flavours ndev-vanilla --only shell/integrity-caching.sh --skip @
running 1 tests
### passed: [ndev-vanilla] shell/integrity-caching.sh 4:24.225
### 1 tests: 1 passed, 0 skipped, 0 timed out, 0 warned, 0 failed in 4:24.453
make[1]: Leaving directory '/root/lvm2/test'
Do you see the issue every time with shell/integrity-caching.sh?
Thanks,
Song
>
> sysrq: Show Blocked State
> task:lvm state:D stack:0 pid:11422 tgid:11422 ppid:1374
> flags:0x00004002
> Call Trace:
> <TASK>
> __schedule+0x228/0x570
> schedule+0x29/0xa0
> schedule_timeout+0x6a/0xd0
> ? timer_shutdown_sync+0x10/0x10
> stop_sync_thread+0x141/0x180 [md_mod]
> ? housekeeping_test_cpu+0x30/0x30
> __md_stop_writes+0x10/0xd0 [md_mod]
> md_stop+0x9/0x20 [md_mod]
> raid_dtr+0x1e/0x60 [dm_raid]
> dm_table_destroy+0x53/0x110 [dm_mod]
> __dm_destroy+0x10b/0x1e0 [dm_mod]
> ? table_clear+0xa0/0xa0 [dm_mod]
> dev_remove+0xd4/0x110 [dm_mod]
> ctl_ioctl+0x2e1/0x570 [dm_mod]
> dm_ctl_ioctl+0x5/0x10 [dm_mod]
> __x64_sys_ioctl+0x85/0xa0
> do_syscall_64+0x5d/0x1a0
> entry_SYSCALL_64_after_hwframe+0x46/0x4e
>
> Signed-off-by: Mikulas Patocka <[email protected]>
> Cc: [email protected] # v6.7
> Fixes: 130443d60b1b ("md: refactor idle/frozen_sync_thread() to fix deadlock")
>
> ---
> drivers/md/md.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> Index: linux-2.6/drivers/md/md.c
> ===================================================================
> --- linux-2.6.orig/drivers/md/md.c
> +++ linux-2.6/drivers/md/md.c
> @@ -4881,7 +4881,8 @@ static void stop_sync_thread(struct mdde
> if (check_seq)
> sync_seq = atomic_read(&mddev->sync_seq);
>
> - if (!test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)) {
> + if (!test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) ||
> + test_bit(MD_RECOVERY_DONE, &mddev->recovery)) {
> if (!locked)
> mddev_unlock(mddev);
> return;
> @@ -4901,6 +4902,7 @@ retry:
>
> if (!wait_event_timeout(resync_wait,
> !test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) ||
> + test_bit(MD_RECOVERY_DONE, &mddev->recovery) ||
> (check_seq && sync_seq != atomic_read(&mddev->sync_seq)),
> HZ / 10))
> goto retry;
>