On 2023-09-28 11:14, Jinjie Ruan wrote:
> As Marco pointed out, commit 2810c1e99867 ("kunit: Fix wild-memory-access
> bug in kunit_free_suite_set()") causes test suites to run while the test
> module is still in MODULE_STATE_COMING. In that state, the module
> is not fully initialized, lacking sysfs, module_memory, args, init
> function which causes null-ptr-deref of using fake devices below.
>
> Since load_module() notify MODULE_STATE_COMING in prepare_coming_module(),
> and then init sysfs and args etc. in parse_args() and mod_sysfs_setup(),
> after that it notify MODULE_STATE_LIVE in do_init_module(), and fake driver
> in the test suits depend on them. So the test suits should be executed when
> notify MODULE_STATE_LIVE.
>
> But the kunit_free_suite_set() in kunit_module_exit() depends on the
> success of kunit_filter_suites() in kunit_module_init(). The best practice
> is to alloc and init resource when notify MODULE_STATE_COMING and free them
> when notify MODULE_STATE_GOING. So split the kunit_module_exec() from
> kunit_module_init() to run test suits when MODULE_STATE_LIVE, call
> kunit_filter_suites() and allocate memory in kunit_module_init() and call
> kunit_free_suite_set() in kunit_module_exit() to free the memory.
>
> So if load_module() succeeds and notify module state as below, it calls
> kunit_module_init(), kunit_module_exec() and kunit_module_exit(), which
> will work ok. The mod->state state machine when load_module() succeeds:
>
> kunit_filter_suites() kunit_module_exec()
> MODULE_STATE_UNFORMED ---> MODULE_STATE_COMING ---> MODULE_STATE_LIVE
> ^ |
> | |
> +---------------- MODULE_STATE_GOING <---------+
> kunit_free_suite_set()
>
> If load_module() fails and notify module state as below, it call
> kunit_module_init() and kunit_module_exit(), which will also work ok.
> The mod->state state machine when load_module() fails at mod_sysfs_setup():
>
> kunit_filter_suites() kunit_free_suite_set()
> MODULE_STATE_UNFORMED ---> MODULE_STATE_COMING ---> MODULE_STATE_GOING
> ^ |
> | |
> +-----------------------------------------------+
>
> general protection fault, probably for non-canonical address
> 0xdffffc0000000003: 0000 [#1] PREEMPT SMP KASAN
> KASAN: null-ptr-deref in range [0x0000000000000018-0x000000000000001f]
> CPU: 1 PID: 1868 Comm: modprobe Tainted: G W N 6.6.0-rc3+ #61
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1
> 04/01/2014
> RIP: 0010:kobject_namespace+0x71/0x150
> Code: 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 cd 00 00 00 48 b8 00 00 00 00 00
> fc ff df 49 8b 5c 24 28 48 8d 7b 18 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85
> c1 00 00 00 48 8b 43 18 48 85 c0 74 79 4c 89 e7
> RSP: 0018:ffff88810f797288 EFLAGS: 00010206
> RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000
> RDX: 0000000000000003 RSI: ffffffff847b4900 RDI: 0000000000000018
> RBP: ffff88810ba08940 R08: 0000000000000001 R09: ffffed1021ef2e0f
> R10: ffff88810f79707f R11: 746e756f63666572 R12: ffffffffa0241990
> R13: ffff88810ba08958 R14: ffff88810ba08968 R15: ffffffff84ac6c20
> FS: 00007ff9f2186540(0000) GS:ffff888119c80000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007fff73a2cff8 CR3: 000000010b77b002 CR4: 0000000000770ee0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> PKRU: 55555554
> Call Trace:
> <TASK>
> ? die_addr+0x3d/0xa0
> ? exc_general_protection+0x144/0x220
> ? asm_exc_general_protection+0x22/0x30
> ? kobject_namespace+0x71/0x150
> kobject_add_internal+0x267/0x870
> kobject_add+0x120/0x1f0
> ? kset_create_and_add+0x160/0x160
> ? __kmem_cache_alloc_node+0x1d2/0x350
> ? _raw_spin_lock+0x87/0xe0
> ? kobject_create_and_add+0x3c/0xb0
> kobject_create_and_add+0x68/0xb0
> module_add_driver+0x260/0x350
> bus_add_driver+0x2c9/0x580
> driver_register+0x133/0x460
> kunit_run_tests+0xdb/0xef0
> ? _prb_read_valid+0x3e3/0x550
> ? _raw_spin_lock+0x87/0xe0
> ? _raw_spin_lock_bh+0xe0/0xe0
> ? __send_ipi_mask+0x1ba/0x450
> ? __pte_offset_map+0x19/0x1f0
> ? __pte_offset_map_lock+0xd6/0x1b0
> ? __kunit_test_suites_exit+0x30/0x30
> ? kvm_smp_send_call_func_ipi+0x68/0xc0
> ? do_sync_core+0x22/0x30
> ? smp_call_function_many_cond+0x1be/0xcf0
> ? __text_poke+0x890/0x890
> ? __text_poke+0x890/0x890
> ? on_each_cpu_cond_mask+0x46/0x70
> ? text_poke_bp_batch+0x413/0x570
> ? do_sync_core+0x30/0x30
> ? __jump_label_patch+0x34c/0x350
> ? mutex_unlock+0x80/0xd0
> ? __mutex_unlock_slowpath.constprop.0+0x2a0/0x2a0
> __kunit_test_suites_init+0xc4/0x120
> kunit_module_notify+0x36c/0x3b0
> ? __kunit_test_suites_init+0x120/0x120
> ? preempt_count_add+0x79/0x150
> notifier_call_chain+0xbf/0x280
> ? kasan_quarantine_put+0x21/0x1a0
> blocking_notifier_call_chain_robust+0xbb/0x140
> ? notifier_call_chain+0x280/0x280
> ? 0xffffffffa0238000
> load_module+0x4af0/0x67d0
> ? module_frob_arch_sections+0x20/0x20
> ? rwsem_down_write_slowpath+0x11a0/0x11a0
> ? kernel_read_file+0x3ca/0x510
> ? __x64_sys_fspick+0x2a0/0x2a0
> ? init_module_from_file+0xd2/0x130
> init_module_from_file+0xd2/0x130
> ? __ia32_sys_init_module+0xa0/0xa0
> ? userfaultfd_unmap_prep+0x3d0/0x3d0
> ? _raw_spin_lock_bh+0xe0/0xe0
> idempotent_init_module+0x339/0x610
> ? init_module_from_file+0x130/0x130
> ? __fget_light+0x57/0x500
> __x64_sys_finit_module+0xba/0x130
> do_syscall_64+0x35/0x80
> entry_SYSCALL_64_after_hwframe+0x46/0xb0
>
> Fixes: 2810c1e99867 ("kunit: Fix wild-memory-access bug in
> kunit_free_suite_set()")
> Reported-by: Marco Pagani <marpa...@redhat.com>
> Signed-off-by: Jinjie Ruan <ruanjin...@huawei.com>
> ---
> lib/kunit/test.c | 25 ++++++++++++++++---------
> 1 file changed, 16 insertions(+), 9 deletions(-)
>
> diff --git a/lib/kunit/test.c b/lib/kunit/test.c
> index 145f70219f46..8fac4783c676 100644
> --- a/lib/kunit/test.c
> +++ b/lib/kunit/test.c
> @@ -739,7 +739,6 @@ static int kunit_module_init(struct module *mod)
> struct kunit_suite_set suite_set = {
> mod->kunit_suites, mod->kunit_suites + mod->num_kunit_suites,
> };
> - const char *action = kunit_action();
> int err = 0;
>
> suite_set = kunit_filter_suites(&suite_set,
> @@ -752,16 +751,28 @@ static int kunit_module_init(struct module *mod)
> mod->kunit_suites = (struct kunit_suite **)suite_set.start;
> mod->num_kunit_suites = suite_set.end - suite_set.start;
>
> - if (!action)
> + return err;
> +}
> +
> +static void kunit_module_exec(struct module *mod)
> +{
> + struct kunit_suite_set suite_set = {
> + mod->kunit_suites, mod->kunit_suites + mod->num_kunit_suites,
> + };
> + const char *action = kunit_action();
> +
> + if (!action) {
> kunit_exec_run_tests(&suite_set, false);
> +
> + __kunit_test_suites_exit(mod->kunit_suites,
> + mod->num_kunit_suites);
> + }
I don't think destroying debugfs right after running the tests is advisable.
The reason why I sent an RFC is to leave room for a discussion on which is
the best way to solve the issue. I think it would be better to have a
discussion before rushing patches.
Thanks,
Marco
> else if (!strcmp(action, "list"))
> kunit_exec_list_tests(&suite_set, false);
> else if (!strcmp(action, "list_attr"))
> kunit_exec_list_tests(&suite_set, true);
> else
> pr_err("kunit: unknown action '%s'\n", action);
> -
> - return err;
> }
>
> static void kunit_module_exit(struct module *mod)
> @@ -769,11 +780,6 @@ static void kunit_module_exit(struct module *mod)
> struct kunit_suite_set suite_set = {
> mod->kunit_suites, mod->kunit_suites + mod->num_kunit_suites,
> };
> - const char *action = kunit_action();
> -
> - if (!action)
> - __kunit_test_suites_exit(mod->kunit_suites,
> - mod->num_kunit_suites);
>
> kunit_free_suite_set(suite_set);
> }
> @@ -789,6 +795,7 @@ static int kunit_module_notify(struct notifier_block *nb,
> unsigned long val,
> ret = kunit_module_init(mod);
> break;
> case MODULE_STATE_LIVE:
> + kunit_module_exec(mod);
> break;
> case MODULE_STATE_GOING:
> kunit_module_exit(mod);