[PATCH] function_graph: Fix the ret_stack used by ftrace_graph_ret_addr()
When ftrace_graph_ret_addr() is invoked to convert a found stack return address to its original value, the function can end up producing the following crash: [ 95.442712] BUG: kernel NULL pointer dereference, address: 0028 [ 95.442720] #PF: supervisor read access in kernel mode [ 95.442724] #PF: error_code(0x) - not-present page [ 95.442727] PGD 0 P4D 0- [ 95.442731] Oops: Oops: [#1] PREEMPT SMP PTI [ 95.442736] CPU: 1 UID: 0 PID: 2214 Comm: insmod Kdump: loaded Tainted: G OE K6.11.0-rc1-default #1 67c62a3b3720562f7e7db5f11c1fdb40b7a2857c [ 95.442747] Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE, [K]=LIVEPATCH [ 95.442750] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014 [ 95.442754] RIP: 0010:ftrace_graph_ret_addr+0x42/0xc0 [ 95.442766] Code: [...] [ 95.442773] RSP: 0018:979b80ff7718 EFLAGS: 00010006 [ 95.442776] RAX: 8ca99b10 RBX: 979b80ff7760 RCX: 979b80167dc0 [ 95.442780] RDX: 8ca99b10 RSI: 979b80ff7790 RDI: 0005 [ 95.442783] RBP: 0001 R08: 0005 R09: [ 95.442786] R10: 0005 R11: R12: 8e9491e0 [ 95.442790] R13: 8d6f70f0 R14: 979b80167da8 R15: 979b80167dc8 [ 95.442793] FS: 7fbf83895740() GS:8a0afdd0() knlGS: [ 95.442797] CS: 0010 DS: ES: CR0: 80050033 [ 95.442800] CR2: 0028 CR3: 05070002 CR4: 00370ef0 [ 95.442806] DR0: DR1: DR2: [ 95.442809] DR3: DR6: fffe0ff0 DR7: 0400 [ 95.442816] Call Trace: [ 95.442823] [ 95.442896] unwind_next_frame+0x20d/0x830 [ 95.442905] arch_stack_walk_reliable+0x94/0xe0 [ 95.442917] stack_trace_save_tsk_reliable+0x7d/0xe0 [ 95.442922] klp_check_and_switch_task+0x55/0x1a0 [ 95.442931] task_call_func+0xd3/0xe0 [ 95.442938] klp_try_switch_task.part.5+0x37/0x150 [ 95.442942] klp_try_complete_transition+0x79/0x2d0 [ 95.442947] klp_enable_patch+0x4db/0x890 [ 95.442960] do_one_initcall+0x41/0x2e0 [ 95.442968] do_init_module+0x60/0x220 [ 95.442975] load_module+0x1ebf/0x1fb0 [ 95.443004] init_module_from_file+0x88/0xc0 [ 95.443010] idempotent_init_module+0x190/0x240 [ 95.443015] __x64_sys_finit_module+0x5b/0xc0 [ 95.443019] do_syscall_64+0x74/0x160 [ 95.443232] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 95.443236] RIP: 0033:0x7fbf82f2c709 [ 95.443241] Code: [...] [ 95.443247] RSP: 002b:7fffd5ea3b88 EFLAGS: 0246 ORIG_RAX: 0139 [ 95.443253] RAX: ffda RBX: 56359c48e750 RCX: 7fbf82f2c709 [ 95.443257] RDX: RSI: 56356ed4efc5 RDI: 0003 [ 95.443260] RBP: 56356ed4efc5 R08: R09: 7fffd5ea3c10 [ 95.443263] R10: 0003 R11: 0246 R12: [ 95.443267] R13: 56359c48e6f0 R14: R15: [ 95.443272] [ 95.443274] Modules linked in: [...] [ 95.443385] Unloaded tainted modules: intel_uncore_frequency(E):1 isst_if_common(E):1 skx_edac(E):1 [ 95.443414] CR2: 0028 The bug can be reproduced with kselftests: cd linux/tools/testing/selftests make TARGETS='ftrace livepatch' (cd ftrace; ./ftracetest test.d/ftrace/fgraph-filter.tc) (cd livepatch; ./test-livepatch.sh) The problem is that ftrace_graph_ret_addr() is supposed to operate on the ret_stack of a selected task but wrongly accesses the ret_stack of the current task. Specifically, the above NULL dereference occurs when task->curr_ret_stack is non-zero, but current->ret_stack is NULL. Correct ftrace_graph_ret_addr() to work with the right ret_stack. Reported-by: Miroslav Benes Fixes: 7aa1eaef9f42 ("function_graph: Allow multiple users to attach to function graph") Signed-off-by: Petr Pavlu --- kernel/trace/fgraph.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c index fc205ad167a9..d1d5ea2d0a1b 100644 --- a/kernel/trace/fgraph.c +++ b/kernel/trace/fgraph.c @@ -902,7 +902,7 @@ unsigned long ftrace_graph_ret_addr(struct task_struct *task, int *idx, i = *idx ? : task->curr_ret_stack; while (i > 0) { - ret_stack = get_ret_stack(current, i, &i); + ret_stack = get_ret_stack(task, i, &i); if (!ret_stack) break; /* base-commit: 17712b7ea0756799635ba159cc773082230ed028 -- 2.43.0
Re: [PATCH v3 1/4] x86/mce: Add wrapper for struct mce to export vendor specific info
Hi Avadhut, kernel test robot noticed the following build warnings: [auto build test WARNING on dee974604f7870167161cbe51e8f3b9c2858de34] url: https://github.com/intel-lab-lkp/linux/commits/Avadhut-Naik/x86-mce-Add-wrapper-for-struct-mce-to-export-vendor-specific-info/20240801-192550 base: dee974604f7870167161cbe51e8f3b9c2858de34 patch link: https://lore.kernel.org/r/20240730185406.3709876-2-avadhut.naik%40amd.com patch subject: [PATCH v3 1/4] x86/mce: Add wrapper for struct mce to export vendor specific info config: x86_64-randconfig-071-20240803 (https://download.01.org/0day-ci/archive/20240803/202408032312.5lkzepqe-...@intel.com/config) compiler: gcc-12 (Debian 12.2.0-14) 12.2.0 reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240803/202408032312.5lkzepqe-...@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot | Closes: https://lore.kernel.org/oe-kbuild-all/202408032312.5lkzepqe-...@intel.com/ All warnings (new ones prefixed by >>): >> vmlinux.o: warning: objtool: do_machine_check+0x6e: call to __asan_memset() >> leaves .noinstr.text section vmlinux.o: warning: objtool: intel_idle_ibrs+0x11: call to sched_smt_active() leaves .noinstr.text section objdump-func vmlinux.o do_machine_check: 25a0 : 25a0: f3 0f 1e fa endbr64 0004 25a4: 41 57 push %r15 0006 25a6: b9 10 00 00 00 mov$0x10,%ecx 000b 25ab: 31 f6 xor%esi,%esi 000d 25ad: ba 80 00 00 00 mov$0x80,%edx 0012 25b2: 41 56 push %r14 0014 25b4: 41 55 push %r13 0016 25b6: 41 54 push %r12 0018 25b8: 55 push %rbp 0019 25b9: 48 89 fdmov%rdi,%rbp 001c 25bc: 53 push %rbx 001d 25bd: 48 81 ec c8 00 00 00sub$0xc8,%rsp 0024 25c4: 65 48 8b 04 25 28 00 00 00 mov%gs:0x28,%rax 002d 25cd: 48 89 84 24 c0 00 00 00 mov%rax,0xc0(%rsp) 0035 25d5: 31 c0 xor%eax,%eax 0037 25d7: 48 8d 5c 24 30 lea0x30(%rsp),%rbx 003c 25dc: c7 44 24 1c 00 00 00 00 movl $0x0,0x1c(%rsp) 0044 25e4: 48 c7 84 24 b0 00 00 00 00 00 00 00 movq $0x0,0xb0(%rsp) 0050 25f0: 48 89 dfmov%rbx,%rdi 0053 25f3: 48 c7 84 24 b8 00 00 00 00 00 00 00 movq $0x0,0xb8(%rsp) 005f 25ff: 48 c7 44 24 20 00 00 00 00 movq $0x0,0x20(%rsp) 0068 2608: f3 48 abrep stos %rax,%es:(%rdi) 006b 260b: 48 89 dfmov%rbx,%rdi 006e 260e: e8 00 00 00 00 call 2613 260f: R_X86_64_PLT32__asan_memset-0x4 0073 2613: f6 05 00 00 00 00 60testb $0x60,0x0(%rip)# 261a 2615: R_X86_64_PC32 mce_flags-0x5 007a 261a: 0f 85 de 04 00 00 jne2afe 0080 2620: f6 05 00 00 00 00 20testb $0x20,0x0(%rip)# 2627 2622: R_X86_64_PC32 mca_cfg-0x5 0087 2627: 0f 84 20 08 00 00 je 2e4d 008d 262d: f6 05 00 00 00 00 01testb $0x1,0x0(%rip)# 2634 262f: R_X86_64_PC32 mce_flags-0x4 0094 2634: 0f 85 f1 04 00 00 jne2b2b 009a 263a: c7 44 24 1c 00 00 00 00 movl $0x0,0x1c(%rsp) 00a2 2642: ff 05 00 00 00 00 incl 0x0(%rip)# 2648 2644: R_X86_64_PC32 mce_exception_count-0x4 00a8 2648: 48 89 eemov%rbp,%rsi 00ab 264b: 48 89 dfmov%rbx,%rdi 00ae 264e: e8 dd fe ff ff call 2530 00b3 2653: 0f 31 rdtsc 00b5 2655: 48 c1 e2 20 shl$0x20,%rdx 00b9 2659: 48 09 d0or %rdx,%rax 00bc 265c: 48 8b 54 24 30 mov0x30(%rsp),%rdx 00c1 2661: 48 89 44 24 58 mov%rax,0x58(%rsp) 00c6 2666: 48 89 05 00 00 00 00mov%rax,0x0(%rip)# 266d 2669: R_X86_64_PC32 .data+0xb2884 00cd 266d: 48 8b 44 24 60 mov0x60(%rsp),%rax 00d2 2672: 48 89 15 00 00 00 00mov%rdx,0x0(%rip)# 2679 2675: R_X86_64_PC32 .data+0xb285c 00d9 2679: 48 8b 54 24 38 mov0x38(%rsp),%rdx 00de 267e: 48 89 05 00 00 00 00mov%rax,0x0(%rip)# 2685 2681: R_X86_64_PC32 .data+0xb288c 00e5 2685: 48 8b 44 24 68 mov0x68(%rsp),%rax 00ea 268a: 48 89 15 00 00 00 00mov%rdx,0x0(%rip)# 2691 268d: R_X86_64_PC32 .data+0xb2864 00f1 2691: 48 8b 54 24 40 mov0x40(%rsp),%rdx 00f6 2696: 48 89 05 00 00 00 00mov%rax,0x0(%rip)# 269d 2699: R_X86_64_PC32 .data+0xb2894 00fd 269d: 48 8b 44 24 70 mov0x
Re: [PATCH net-next v12 04/14] mm: page_frag: add '_va' suffix to page_frag API
On 8/3/2024 1:00 AM, Alexander Duyck wrote: As far as your API extension and naming maybe you should look like something like bio_vec and borrow the naming from that since that is essentially what you are passing back and forth is essentially that instead of a page frag which is normally a virtual address. I thought about adding something like bio_vec before, but I am not sure what you have in mind is somthing like I considered before? Let's say that we reuse bio_vec like something below for the new APIs: struct bio_vec { struct page *bv_page; void*va; unsigned intbv_len; unsigned intbv_offset; }; I wasn't suggesting changing the bio_vec. I was suggesting that be what you pass as a pointer reference instead of the offset. Basically your use case is mostly just for populating bio_vec style structures anyway. I wasn't trying/going to reuse/change bio_vec for page_frag, I was just having a hard time coming with a good new name for it. The best one I came up with is pfrag_vec, but I am not sure about the 'vec' as the "vec" portion of the name would suggest, iovec structures tend to come in arrays, mentioned in the below article: https://lwn.net/Articles/625077/ Anther one is page_frag, which is currently in use. Or any better one in your mind? It seems we have the below options for the new API: option 1, it seems like a better option from API naming point of view, but it needs to return a bio_vec pointer to the caller, it seems we need to have extra space for the pointer, I am not sure how we can avoid the memory waste for sk_page_frag() case in patch 12: struct bio_vec *page_frag_alloc_bio(struct page_frag_cache *nc, unsigned int fragsz, gfp_t gfp_mask); option 2, it need both the caller and callee to have a its own local space for 'struct bio_vec ', I am not sure if passing the content instead of the pointer of a struct through the function returning is the common pattern and if it has any performance impact yet: struct bio_vec page_frag_alloc_bio(struct page_frag_cache *nc, unsigned int fragsz, gfp_t gfp_mask); option 3, the caller passes the pointer of 'struct bio_vec ' to the callee, and page_frag_alloc_bio() fills in the data, I am not sure what is the point of indirect using 'struct bio_vec ' instead of passing 'va' & 'fragsz' & 'offset' through pointers directly: bool page_frag_alloc_bio(struct page_frag_cache *nc, unsigned int fragsz, gfp_t gfp_mask, struct bio_vec *bio); If one of the above option is something in your mind? Yes, please be more specific about which one is the prefer option, and why it is the prefer option than the one introduced in this patchset? If no, please be more specific what that is in your mind? Option 3 is more or less what I had in mind. Basically you would return an int to indicate any errors and you would be populating a bio_vec during your allocation. In addition you would use the bio_vec Actually using this new bio_vec style structures does not seem to solve the APIs naming issue this patch is trying to solve as my understanding, as the new struct is only about passing one pointer or multi-pointers from API naming perspective. It is part of the API naming, but not all of it. as a tracker of the actual fragsz so when you commit you are committing with the fragsz as it was determined at the time of putting the bio_vec together so you can theoretically catch things like if the underlying offset had somehow changed from the time you setup the I think we might need a stronger argument than the above to use the new *vec thing other than the above debugging feature. I looked throught the bio_vec related info, and come along somewhat not really related, but really helpful "What’s all this get us" section: https://docs.kernel.org/block/biovecs.html So the question seems to be: what is this new struct for page_frag get us? Generally, I am argeed with the new struct thing if it does bring us something other than the above debugging feature. Otherwise we should avoid introducing a new thing which is hard to argue about its existent. allocation. It would fit well into your probe routines since they are all essentially passing the page, offset, and fragsz throughout the code. For the current probe routines, the 'va' need to be passed, do you expect the 'va' to be passed by function return, double pointer, or new the *_vec pointer?
Re: [PATCH net-next v12 04/14] mm: page_frag: add '_va' suffix to page_frag API
Regardless of the API discussion, The nvme-tcp bits look straight-forward: Acked-by: Sagi Grimberg