On 8/14/24 21:38, 'Erhard Furtner' via KUnit Development wrote:
On Mon, 12 Aug 2024 11:54:11 -0700
Kees Cook <k...@kernel.org> wrote:

On Fri, Aug 09, 2024 at 11:15:37PM +0200, Erhard Furtner wrote:
Greetings!

When KASAN is enabled the Overflow KUnit test fails:

[...]
     ok 16 shift_nonsense_test
     # overflow_allocation_test: 11 allocation overflow tests finished
==================================================================
BUG: KASAN: stack-out-of-bounds in string_nocheck+0x168/0x1c8
Read of size 1 at addr c976be40 by task kunit_try_catch/1843

CPU: 0 UID: 0 PID: 1843 Comm: kunit_try_catch Tainted: G                 N 
6.11.0-rc2-PMacG4 #1
Tainted: [N]=TEST
Hardware name: PowerMac3,6 7455 0x80010303 PowerMac
Call Trace:
[c992fb80] [c16651c0] dump_stack_lvl+0x80/0xac (unreliable)
[c992fba0] [c04e0420] print_report+0xdc/0x504
[c992fc00] [c04e01d8] kasan_report+0xf8/0x108
[c992fc80] [c16ae4c8] string_nocheck+0x168/0x1c8
[c992fcf0] [c16b37a4] string+0xa8/0xbc
[c992fd60] [c16b8134] vsnprintf+0x868/0x1750
[c992fdf0] [c0b8490c] kvasprintf+0xa4/0x13c
[c992fe60] [c0b84c3c] kasprintf+0xb4/0xc8
[c992fed0] [c0f4c954] module_remove_driver+0x1f0/0x2fc
[c992ff00] [c0f21628] bus_remove_driver+0x1d0/0x240
[c992ff30] [bfd0cd40] kunit_put_resource+0x128/0x134 [kunit]
[c992ff50] [bfd0a120] kunit_cleanup+0x140/0x144 [kunit]
[c992ff90] [bfd10d64] kunit_generic_run_threadfn_adapter+0xf8/0x148 [kunit]
[c992ffc0] [c00f57e0] kthread+0x36c/0x37c
[c992fff0] [c0028304] start_kernel_thread+0x10/0x14

The buggy address belongs to the physical page:
page: refcount:0 mapcount:0 mapping:00000000 index:0x0 pfn:0x976b
flags: 0x0(zone=0)
raw: 00000000 00000000 eef2bb10 00000000 00000000 00000000 ffffffff 00000000
raw: 00000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
  c976bd00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  c976bd80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c976be00: 00 00 00 00 00 00 00 00 f1 f1 f1 f1 04 f2 04 f2
                                    ^
  c976be80: 00 04 f3 f3 00 00 00 00 00 00 00 00 00 00 00 00
  c976bf00: 00 00 f1 f1 f1 f1 00 f3 f3 f3 00 00 00 00 00 00
==================================================================
Disabling lock debugging due to kernel taint
     not ok 17 overflow_allocation_test
     # overflow_size_helpers_test: 43 overflow size helper tests finished
     ok 18 overflow_size_helpers_test
     # overflows_type_test: 378 overflows_type() tests finished
     ok 19 overflows_type_test
     # same_type_test: 0 __same_type() tests finished
     ok 20 same_type_test
     # castable_to_type_test: 75 castable_to_type() tests finished
     ok 21 castable_to_type_test
     ok 22 DEFINE_FLEX_test
# overflow: pass:21 fail:1 skip:0 total:22
# Totals: pass:21 fail:1 skip:0 total:22
not ok 1 overflow


This is reproducible on my machine and always happens when running the test via 
'modprobe -v overflow_kunit'. Without KASAN enabled (but KFENCE) 
overflow_allocation_test passes.

Hmm, this implies some kind of corruption is sneaking in and the kunit
resource freeing code is exploding. I don't immediately see the problem,
though.

Not the 1st memory corruption I got on ppc32 
(https://lore.kernel.org/all/20240811165230.91DCFA0660@freki.localdomain/) 
btw., but this does not seem related.

I just did a kernel build with overflow_kunit statically built in to run at boot. This 
way I don't get the "BUG: KASAN: stack-out-of-bounds in 
string_nocheck+0x168/0x1c8" on the PowerMac and on qemu. Run directly at boot the 
overflow_kunit just passes. As soon as I build it as module and modprobe it later, I hit 
the issue. Strange...

A hint that not the test itself might cause the stack corruption but another 
process.

Regards,
Erhard


Hi Erhard and Kees,

On my QEMU setup the overflow_kunit test produces the following kernel panic when running "modprobe -v overflow_kunit" with KASAN and KFENCE enabled:

[   52.574541] BUG: KASAN: stack-out-of-bounds in string+0x2a0/0x320
[ 52.574541] Read of size 1 at addr ffffc900010d7d88 by task systemd-udevd/144
[   52.574541]
[ 52.574541] CPU: 11 UID: 0 PID: 144 Comm: systemd-udevd Tainted: G N 6.11.0-rc2-00319-g1fcd5c59a7f8 #83
[   52.574541] Tainted: [N]=TEST
[ 52.574541] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
[   52.574541] Call Trace:
[   52.574541]  <TASK>
[   52.574541]  dump_stack_lvl+0x55/0x70
[   52.574541]  print_report+0xcb/0x620
[   52.574541]  ? srso_return_thunk+0x5/0x5f
[   52.574541]  ? string+0x2a0/0x320
[   52.574541]  kasan_report+0xc5/0x100
[   52.574541]  ? string+0x2a0/0x320
[   52.574541]  string+0x2a0/0x320
[   52.574541]  ? __pfx_string+0x10/0x10
[   52.574541]  ? srso_return_thunk+0x5/0x5f
[   52.574541]  vsnprintf+0x809/0x1600
[   52.574541]  ? __pfx_vsnprintf+0x10/0x10
[   52.574541]  ? kasan_save_stack+0x24/0x50
[   52.574541]  ? __kasan_kmalloc+0xaa/0xb0
[   52.574541]  ? uevent_show+0x127/0x300
[   52.574541]  ? dev_attr_show+0x41/0xc0
[   52.574541]  ? sysfs_kf_seq_show+0x213/0x400
[   52.574541]  ? seq_read_iter+0x404/0x1070
[   52.574541]  ? vfs_read+0x642/0x8f0
[   52.574541]  add_uevent_var+0x135/0x2e0
[   52.574541]  ? __kmalloc_node_noprof+0x1bc/0x3a0
[   52.574541]  ? seq_read_iter+0x67d/0x1070
[   52.574541]  ? __pfx_add_uevent_var+0x10/0x10
[   52.574541]  ? entry_SYSCALL_64_after_hwframe+0x77/0x7f
[   52.574541]  ? stack_trace_save+0x8f/0xc0
[   52.574541]  ? __pfx_stack_trace_save+0x10/0x10
[   52.574541]  ? srso_return_thunk+0x5/0x5f
[   52.574541]  ? stack_depot_save_flags+0x2e/0x710
[   52.574541]  dev_uevent+0x166/0x6a0
[   52.574541]  ? srso_return_thunk+0x5/0x5f
[   52.574541]  ? __pfx_dev_uevent+0x10/0x10
[   52.574541]  ? srso_return_thunk+0x5/0x5f
[   52.574541]  ? kasan_unpoison+0x27/0x60
[   52.574541]  ? srso_return_thunk+0x5/0x5f
[   52.574541]  ? __kasan_slab_alloc+0x4d/0x90
[   52.574541]  ? srso_return_thunk+0x5/0x5f
[   52.574541]  ? __kmalloc_cache_noprof+0x100/0x2b0
[   52.574541]  ? srso_return_thunk+0x5/0x5f
[   52.574541]  ? kasan_save_track+0x14/0x30
[   52.574541]  uevent_show+0x183/0x300
[   52.574541]  ? srso_return_thunk+0x5/0x5f
[   52.574541]  ? mutex_lock+0x8c/0xe0
[   52.574541]  ? __pfx_dev_attr_show+0x10/0x10
[   52.574541]  dev_attr_show+0x41/0xc0
[   52.574541]  sysfs_kf_seq_show+0x213/0x400
[   52.574541]  ? srso_return_thunk+0x5/0x5f
[   52.574541]  seq_read_iter+0x404/0x1070
[   52.574541]  vfs_read+0x642/0x8f0
[   52.574541]  ? __pfx_vfs_read+0x10/0x10
[   52.574541]  ? srso_return_thunk+0x5/0x5f
[   52.574541]  ? srso_return_thunk+0x5/0x5f
[   52.574541]  ? __do_sys_newfstatat+0x86/0xd0
[   52.574541]  ? __pfx___do_sys_newfstatat+0x10/0x10
[   52.574541]  ksys_read+0xec/0x1c0
[   52.574541]  ? __pfx_ksys_read+0x10/0x10
[   52.574541]  ? srso_return_thunk+0x5/0x5f
[   52.574541]  do_syscall_64+0xa6/0x1a0
[   52.574541]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
[   52.574541] RIP: 0033:0x7fcf58ddf7e2
[ 52.574541] Code: c0 e9 b2 fe ff ff 50 48 8d 3d 8a b4 0c 00 e8 a5 1d 02 00 0f 1f 44 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 0f 05 <48> 3d 00 f0 ff ff 77 56 c3 0f 1f 44 00 00 48 83 ec 28 48 89 54 24 [ 52.574541] RSP: 002b:00007ffd98a30d88 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 [ 52.574541] RAX: ffffffffffffffda RBX: 0000000000001018 RCX: 00007fcf58ddf7e2 [ 52.574541] RDX: 0000000000001018 RSI: 00005586e442f2a0 RDI: 000000000000000c [ 52.574541] RBP: 00005586e442f2a0 R08: 0000000000000000 R09: 00005586e442f2a0 [ 52.574541] R10: 00007fcf58ee5d10 R11: 0000000000000246 R12: 000000000000000c [ 52.574541] R13: 0000000000001017 R14: 0000000000000002 R15: 00007ffd98a30db0
[   52.574541]  </TASK>
[   52.574541]
[   52.574541] The buggy address belongs to the virtual mapping at
[   52.574541]  [ffffc900010d0000, ffffc900010d9000) created by:
[   52.574541]  kernel_clone+0xb9/0x6c0
[   52.574541]
[   52.574541] The buggy address belongs to the physical page:
[ 52.574541] page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0xa9df
[   52.574541] flags: 0x100000000000000(node=0|zone=1)
[ 52.574541] raw: 0100000000000000 0000000000000000 dead000000000122 0000000000000000 [ 52.574541] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
[   52.574541] page dumped because: kasan: bad access detected
[   52.574541]
[   52.574541] Memory state around the buggy address:
[ 52.574541] ffffc900010d7c80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f1 [ 52.574541] ffffc900010d7d00: f1 f1 f1 f1 f1 04 f2 00 f2 f2 f2 00 00 00 f3 f3 [ 52.574541] >ffffc900010d7d80: f3 f3 f3 00 00 00 00 00 00 00 00 00 00 00 00 00
[   52.574541]                       ^
[ 52.574541] ffffc900010d7e00: 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00 00 [ 52.574541] ffffc900010d7e80: f3 f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 52.574541] ==================================================================
[   52.600667] Disabling lock debugging due to kernel taint


And it looks like I found the root cause (lib/overflow_kunit.c +671):
...
static void overflow_allocation_test(struct kunit *test)
{
       const char device_name[] = "overflow-test";
...

As you can see, the device name is defined as a local variable, which means that it doesn't exist out of the 'overflow_allocation_test' function scope. This patch:

diff --git a/lib/overflow_kunit.c b/lib/overflow_kunit.c
index f314a0c15a6d..fa7ca8c94eee 100644
--- a/lib/overflow_kunit.c
+++ b/lib/overflow_kunit.c
@@ -668,7 +668,7 @@ DEFINE_TEST_ALLOC(devm_kzalloc,  devm_kfree, 1, 1, 0);

 static void overflow_allocation_test(struct kunit *test)
 {
-       const char device_name[] = "overflow-test";
+       static const char device_name[] = "overflow-test";
        struct device *dev;
        int count = 0;


Seems to fix the problem and it is not reproducable anymore.

I will send the proper patch tomorrow.

Good night!

--
Kind regards,
Ivan Orlov

Reply via email to