[PATCH] function_graph: Fix the ret_stack used by ftrace_graph_ret_addr()

2024-08-03 Thread Petr Pavlu
When ftrace_graph_ret_addr() is invoked to convert a found stack return
address to its original value, the function can end up producing the
following crash:

[   95.442712] BUG: kernel NULL pointer dereference, address: 0028
[   95.442720] #PF: supervisor read access in kernel mode
[   95.442724] #PF: error_code(0x) - not-present page
[   95.442727] PGD 0 P4D 0-
[   95.442731] Oops: Oops:  [#1] PREEMPT SMP PTI
[   95.442736] CPU: 1 UID: 0 PID: 2214 Comm: insmod Kdump: loaded Tainted: G
   OE K6.11.0-rc1-default #1 67c62a3b3720562f7e7db5f11c1fdb40b7a2857c
[   95.442747] Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE, [K]=LIVEPATCH
[   95.442750] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014
[   95.442754] RIP: 0010:ftrace_graph_ret_addr+0x42/0xc0
[   95.442766] Code: [...]
[   95.442773] RSP: 0018:979b80ff7718 EFLAGS: 00010006
[   95.442776] RAX: 8ca99b10 RBX: 979b80ff7760 RCX: 979b80167dc0
[   95.442780] RDX: 8ca99b10 RSI: 979b80ff7790 RDI: 0005
[   95.442783] RBP: 0001 R08: 0005 R09: 
[   95.442786] R10: 0005 R11:  R12: 8e9491e0
[   95.442790] R13: 8d6f70f0 R14: 979b80167da8 R15: 979b80167dc8
[   95.442793] FS:  7fbf83895740() GS:8a0afdd0() 
knlGS:
[   95.442797] CS:  0010 DS:  ES:  CR0: 80050033
[   95.442800] CR2: 0028 CR3: 05070002 CR4: 00370ef0
[   95.442806] DR0:  DR1:  DR2: 
[   95.442809] DR3:  DR6: fffe0ff0 DR7: 0400
[   95.442816] Call Trace:
[   95.442823]  
[   95.442896]  unwind_next_frame+0x20d/0x830
[   95.442905]  arch_stack_walk_reliable+0x94/0xe0
[   95.442917]  stack_trace_save_tsk_reliable+0x7d/0xe0
[   95.442922]  klp_check_and_switch_task+0x55/0x1a0
[   95.442931]  task_call_func+0xd3/0xe0
[   95.442938]  klp_try_switch_task.part.5+0x37/0x150
[   95.442942]  klp_try_complete_transition+0x79/0x2d0
[   95.442947]  klp_enable_patch+0x4db/0x890
[   95.442960]  do_one_initcall+0x41/0x2e0
[   95.442968]  do_init_module+0x60/0x220
[   95.442975]  load_module+0x1ebf/0x1fb0
[   95.443004]  init_module_from_file+0x88/0xc0
[   95.443010]  idempotent_init_module+0x190/0x240
[   95.443015]  __x64_sys_finit_module+0x5b/0xc0
[   95.443019]  do_syscall_64+0x74/0x160
[   95.443232]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[   95.443236] RIP: 0033:0x7fbf82f2c709
[   95.443241] Code: [...]
[   95.443247] RSP: 002b:7fffd5ea3b88 EFLAGS: 0246 ORIG_RAX: 
0139
[   95.443253] RAX: ffda RBX: 56359c48e750 RCX: 7fbf82f2c709
[   95.443257] RDX:  RSI: 56356ed4efc5 RDI: 0003
[   95.443260] RBP: 56356ed4efc5 R08:  R09: 7fffd5ea3c10
[   95.443263] R10: 0003 R11: 0246 R12: 
[   95.443267] R13: 56359c48e6f0 R14:  R15: 
[   95.443272]  
[   95.443274] Modules linked in: [...]
[   95.443385] Unloaded tainted modules: intel_uncore_frequency(E):1 
isst_if_common(E):1 skx_edac(E):1
[   95.443414] CR2: 0028

The bug can be reproduced with kselftests:

 cd linux/tools/testing/selftests
 make TARGETS='ftrace livepatch'
 (cd ftrace; ./ftracetest test.d/ftrace/fgraph-filter.tc)
 (cd livepatch; ./test-livepatch.sh)

The problem is that ftrace_graph_ret_addr() is supposed to operate on the
ret_stack of a selected task but wrongly accesses the ret_stack of the
current task. Specifically, the above NULL dereference occurs when
task->curr_ret_stack is non-zero, but current->ret_stack is NULL.

Correct ftrace_graph_ret_addr() to work with the right ret_stack.

Reported-by: Miroslav Benes 
Fixes: 7aa1eaef9f42 ("function_graph: Allow multiple users to attach to 
function graph")
Signed-off-by: Petr Pavlu 
---
 kernel/trace/fgraph.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index fc205ad167a9..d1d5ea2d0a1b 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -902,7 +902,7 @@ unsigned long ftrace_graph_ret_addr(struct task_struct 
*task, int *idx,
 
i = *idx ? : task->curr_ret_stack;
while (i > 0) {
-   ret_stack = get_ret_stack(current, i, &i);
+   ret_stack = get_ret_stack(task, i, &i);
if (!ret_stack)
break;
/*

base-commit: 17712b7ea0756799635ba159cc773082230ed028
-- 
2.43.0




Re: [PATCH v3 1/4] x86/mce: Add wrapper for struct mce to export vendor specific info

2024-08-03 Thread kernel test robot
Hi Avadhut,

kernel test robot noticed the following build warnings:

[auto build test WARNING on dee974604f7870167161cbe51e8f3b9c2858de34]

url:
https://github.com/intel-lab-lkp/linux/commits/Avadhut-Naik/x86-mce-Add-wrapper-for-struct-mce-to-export-vendor-specific-info/20240801-192550
base:   dee974604f7870167161cbe51e8f3b9c2858de34
patch link:
https://lore.kernel.org/r/20240730185406.3709876-2-avadhut.naik%40amd.com
patch subject: [PATCH v3 1/4] x86/mce: Add wrapper for struct mce to export 
vendor specific info
config: x86_64-randconfig-071-20240803 
(https://download.01.org/0day-ci/archive/20240803/202408032312.5lkzepqe-...@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce (this is a W=1 build): 
(https://download.01.org/0day-ci/archive/20240803/202408032312.5lkzepqe-...@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot 
| Closes: 
https://lore.kernel.org/oe-kbuild-all/202408032312.5lkzepqe-...@intel.com/

All warnings (new ones prefixed by >>):

>> vmlinux.o: warning: objtool: do_machine_check+0x6e: call to __asan_memset() 
>> leaves .noinstr.text section
   vmlinux.o: warning: objtool: intel_idle_ibrs+0x11: call to 
sched_smt_active() leaves .noinstr.text section


objdump-func vmlinux.o do_machine_check:
 25a0 :
 25a0:  f3 0f 1e fa endbr64
0004 25a4:  41 57   push   %r15
0006 25a6:  b9 10 00 00 00  mov$0x10,%ecx
000b 25ab:  31 f6   xor%esi,%esi
000d 25ad:  ba 80 00 00 00  mov$0x80,%edx
0012 25b2:  41 56   push   %r14
0014 25b4:  41 55   push   %r13
0016 25b6:  41 54   push   %r12
0018 25b8:  55  push   %rbp
0019 25b9:  48 89 fdmov%rdi,%rbp
001c 25bc:  53  push   %rbx
001d 25bd:  48 81 ec c8 00 00 00sub$0xc8,%rsp
0024 25c4:  65 48 8b 04 25 28 00 00 00  mov%gs:0x28,%rax
002d 25cd:  48 89 84 24 c0 00 00 00 mov%rax,0xc0(%rsp)
0035 25d5:  31 c0   xor%eax,%eax
0037 25d7:  48 8d 5c 24 30  lea0x30(%rsp),%rbx
003c 25dc:  c7 44 24 1c 00 00 00 00 movl   $0x0,0x1c(%rsp)
0044 25e4:  48 c7 84 24 b0 00 00 00 00 00 00 00 movq   $0x0,0xb0(%rsp)
0050 25f0:  48 89 dfmov%rbx,%rdi
0053 25f3:  48 c7 84 24 b8 00 00 00 00 00 00 00 movq   $0x0,0xb8(%rsp)
005f 25ff:  48 c7 44 24 20 00 00 00 00  movq   $0x0,0x20(%rsp)
0068 2608:  f3 48 abrep stos %rax,%es:(%rdi)
006b 260b:  48 89 dfmov%rbx,%rdi
006e 260e:  e8 00 00 00 00  call   2613  
260f: R_X86_64_PLT32__asan_memset-0x4
0073 2613:  f6 05 00 00 00 00 60testb  $0x60,0x0(%rip)# 261a 
2615: R_X86_64_PC32 mce_flags-0x5
007a 261a:  0f 85 de 04 00 00   jne2afe 
0080 2620:  f6 05 00 00 00 00 20testb  $0x20,0x0(%rip)# 2627 
2622: R_X86_64_PC32 mca_cfg-0x5
0087 2627:  0f 84 20 08 00 00   je 2e4d 
008d 262d:  f6 05 00 00 00 00 01testb  $0x1,0x0(%rip)# 2634 
 262f: R_X86_64_PC32 mce_flags-0x4
0094 2634:  0f 85 f1 04 00 00   jne2b2b 
009a 263a:  c7 44 24 1c 00 00 00 00 movl   $0x0,0x1c(%rsp)
00a2 2642:  ff 05 00 00 00 00   incl   0x0(%rip)# 2648 
  2644: R_X86_64_PC32 mce_exception_count-0x4
00a8 2648:  48 89 eemov%rbp,%rsi
00ab 264b:  48 89 dfmov%rbx,%rdi
00ae 264e:  e8 dd fe ff ff  call   2530 
00b3 2653:  0f 31   rdtsc
00b5 2655:  48 c1 e2 20 shl$0x20,%rdx
00b9 2659:  48 09 d0or %rdx,%rax
00bc 265c:  48 8b 54 24 30  mov0x30(%rsp),%rdx
00c1 2661:  48 89 44 24 58  mov%rax,0x58(%rsp)
00c6 2666:  48 89 05 00 00 00 00mov%rax,0x0(%rip)# 266d 
 2669: R_X86_64_PC32 .data+0xb2884
00cd 266d:  48 8b 44 24 60  mov0x60(%rsp),%rax
00d2 2672:  48 89 15 00 00 00 00mov%rdx,0x0(%rip)# 2679 
 2675: R_X86_64_PC32 .data+0xb285c
00d9 2679:  48 8b 54 24 38  mov0x38(%rsp),%rdx
00de 267e:  48 89 05 00 00 00 00mov%rax,0x0(%rip)# 2685 
 2681: R_X86_64_PC32 .data+0xb288c
00e5 2685:  48 8b 44 24 68  mov0x68(%rsp),%rax
00ea 268a:  48 89 15 00 00 00 00mov%rdx,0x0(%rip)# 2691 
 268d: R_X86_64_PC32 .data+0xb2864
00f1 2691:  48 8b 54 24 40  mov0x40(%rsp),%rdx
00f6 2696:  48 89 05 00 00 00 00mov%rax,0x0(%rip)# 269d 
 2699: R_X86_64_PC32 .data+0xb2894
00fd 269d:  48 8b 44 24 70  mov0x

Re: [PATCH net-next v12 04/14] mm: page_frag: add '_va' suffix to page_frag API

2024-08-03 Thread Yunsheng Lin

On 8/3/2024 1:00 AM, Alexander Duyck wrote:





As far as your API extension and naming maybe you should look like
something like bio_vec and borrow the naming from that since that is
essentially what you are passing back and forth is essentially that
instead of a page frag which is normally a virtual address.


I thought about adding something like bio_vec before, but I am not sure
what you have in mind is somthing like I considered before?
Let's say that we reuse bio_vec like something below for the new APIs:

struct bio_vec {
 struct page *bv_page;
 void*va;
 unsigned intbv_len;
 unsigned intbv_offset;
};


I wasn't suggesting changing the bio_vec. I was suggesting that be
what you pass as a pointer reference instead of the offset. Basically
your use case is mostly just for populating bio_vec style structures
anyway.


I wasn't trying/going to reuse/change bio_vec for page_frag, I was just
having a hard time coming with a good new name for it.
The best one I came up with is pfrag_vec, but I am not sure about the
'vec' as the "vec" portion of the name would suggest, iovec structures 
tend to come in arrays, mentioned in the below article:

https://lwn.net/Articles/625077/

Anther one is page_frag, which is currently in use.

Or any better one in your mind?




It seems we have the below options for the new API:

option 1, it seems like a better option from API naming point of view, but
it needs to return a bio_vec pointer to the caller, it seems we need to have
extra space for the pointer, I am not sure how we can avoid the memory waste
for sk_page_frag() case in patch 12:
struct bio_vec *page_frag_alloc_bio(struct page_frag_cache *nc,
 unsigned int fragsz, gfp_t gfp_mask);

option 2, it need both the caller and callee to have a its own local space
for 'struct bio_vec ', I am not sure if passing the content instead of
the pointer of a struct through the function returning is the common pattern
and if it has any performance impact yet:
struct bio_vec page_frag_alloc_bio(struct page_frag_cache *nc,
unsigned int fragsz, gfp_t gfp_mask);

option 3, the caller passes the pointer of 'struct bio_vec ' to the callee,
and page_frag_alloc_bio() fills in the data, I am not sure what is the point
of indirect using 'struct bio_vec ' instead of passing 'va' & 'fragsz' &
'offset' through pointers directly:
bool page_frag_alloc_bio(struct page_frag_cache *nc,
  unsigned int fragsz, gfp_t gfp_mask, struct bio_vec 
*bio);

If one of the above option is something in your mind? Yes, please be more 
specific
about which one is the prefer option, and why it is the prefer option than the 
one
introduced in this patchset?

If no, please be more specific what that is in your mind?


Option 3 is more or less what I had in mind. Basically you would
return an int to indicate any errors and you would be populating a
bio_vec during your allocation. In addition you would use the bio_vec


Actually using this new bio_vec style structures does not seem to solve
the APIs naming issue this patch is trying to solve as my understanding,
as the new struct is only about passing one pointer or multi-pointers
from API naming perspective. It is part of the API naming, but not all
of it.


as a tracker of the actual fragsz so when you commit you are
committing with the fragsz as it was determined at the time of putting
the bio_vec together so you can theoretically catch things like if the
underlying offset had somehow changed from the time you setup the


I think we might need a stronger argument than the above to use the new
*vec thing other than the above debugging feature.

I looked throught the bio_vec related info, and come along somewhat not
really related, but really helpful "What’s all this get us" section:
https://docs.kernel.org/block/biovecs.html

So the question seems to be: what is this new struct for page_frag get
us?

Generally, I am argeed with the new struct thing if it does bring us
something other than the above debugging feature. Otherwise we should
avoid introducing a new thing which is hard to argue about its existent.


allocation. It would fit well into your probe routines since they are
all essentially passing the page, offset, and fragsz throughout the
code.


For the current probe routines, the 'va' need to be passed, do you
expect the 'va' to be passed by function return, double pointer, or
new the *_vec pointer?









Re: [PATCH net-next v12 04/14] mm: page_frag: add '_va' suffix to page_frag API

2024-08-03 Thread Sagi Grimberg

Regardless of the API discussion,

The nvme-tcp bits look straight-forward:
Acked-by: Sagi Grimberg