[PATCH] x86/kprobes: fix incorrect return address calculation in kprobe_emulate_call_indirect

2024-01-02 Thread Jinghao Jia
[C1] R13:  R14: 888100d30200 R15: 

[7.330193][C1] FS:  () GS:88813bc8() 
knlGS:
[7.330632][C1] CS:  0010 DS:  ES:  CR0: 80050033
[7.331050][C1] CR2: 0002b4d8 CR3: 03028003 CR4: 
00370ef0
[7.331454][C1] DR0:  DR1:  DR2: 

[7.331854][C1] DR3:  DR6: fffe0ff0 DR7: 
0400
[7.332236][C1] Kernel panic - not syncing: Fatal exception in interrupt
[7.332730][C1] Kernel Offset: disabled
[7.333044][C1] ---[ end Kernel panic - not syncing: Fatal exception in 
interrupt ]---

The relevant assembly code is (from objdump, faulting address
highlighted):

8102ed9d:   41 ff d3  call   *%r11
8102eda0:   65 48 <8b> 05 30 c7 ffmov
%gs:0x7effc730(%rip),%rax

The emulation incorrectly sets the return address to be 8102ed9d
+ 0x5 = 8102eda2, which is the 8b byte in the middle of the next
mov. This in turn causes incorrect subsequent instruction decoding and
eventually triggers the page fault above.

Instead of invoking int3_emulate_call, perform push and jmp emulation
directly in kprobe_emulate_call_indirect. At this point we can obtain
the instruction size from p->ainsn.size so that we can calculate the
correct return address.

Fixes: 6256e668b7af ("x86/kprobes: Use int3 instead of debug trap for 
single-step")
Signed-off-by: Jinghao Jia 
---
 arch/x86/kernel/kprobes/core.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
index e8babebad7b8..a0ce46c0a2d8 100644
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -576,7 +576,8 @@ static void kprobe_emulate_call_indirect(struct kprobe *p, 
struct pt_regs *regs)
 {
unsigned long offs = addrmode_regoffs[p->ainsn.indirect.reg];
 
-   int3_emulate_call(regs, regs_get_register(regs, offs));
+   int3_emulate_push(regs, regs->ip - INT3_INSN_SIZE + p->ainsn.size);
+   int3_emulate_jmp(regs, regs_get_register(regs, offs));
 }
 NOKPROBE_SYMBOL(kprobe_emulate_call_indirect);
 
-- 
2.43.0




[RFC PATCH 1/2] x86/kprobes: Prohibit kprobing on INT and UD

2024-01-26 Thread Jinghao Jia
Both INTs (INT n, INT1, INT3, INTO) and UDs (UD0, UD1, UD2) serve
special purposes in the kernel, e.g., INT3 is used by KGDB and UD2 is
involved in LLVM-KCFI instrumentation. At the same time, attaching
kprobes on these instructions (particularly UDs) will pollute the stack
trace dumped in the kernel ring buffer, since the exception is triggered
in the copy buffer rather than the original location.

Check for INTs and UDs in can_probe and reject any kprobes trying to
attach to these instructions.

Suggested-by: Masami Hiramatsu (Google) 
Signed-off-by: Jinghao Jia 
---
 arch/x86/kernel/kprobes/core.c | 33 ++---
 1 file changed, 26 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
index e8babebad7b8..792b38d22126 100644
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -252,6 +252,22 @@ unsigned long recover_probed_instruction(kprobe_opcode_t 
*buf, unsigned long add
return __recover_probed_insn(buf, addr);
 }
 
+static inline int is_exception_insn(struct insn *insn)
+{
+   if (insn->opcode.bytes[0] == 0x0f) {
+   /* UD0 / UD1 / UD2 */
+   return insn->opcode.bytes[1] == 0xff ||
+  insn->opcode.bytes[1] == 0xb9 ||
+  insn->opcode.bytes[1] == 0x0b;
+   } else {
+   /* INT3 / INT n / INTO / INT1 */
+   return insn->opcode.bytes[0] == 0xcc ||
+  insn->opcode.bytes[0] == 0xcd ||
+  insn->opcode.bytes[0] == 0xce ||
+  insn->opcode.bytes[0] == 0xf1;
+   }
+}
+
 /* Check if paddr is at an instruction boundary */
 static int can_probe(unsigned long paddr)
 {
@@ -294,6 +310,16 @@ static int can_probe(unsigned long paddr)
 #endif
addr += insn.length;
}
+   __addr = recover_probed_instruction(buf, addr);
+   if (!__addr)
+   return 0;
+
+   if (insn_decode_kernel(&insn, (void *)__addr) < 0)
+   return 0;
+
+   if (is_exception_insn(&insn))
+   return 0;
+
if (IS_ENABLED(CONFIG_CFI_CLANG)) {
/*
 * The compiler generates the following instruction sequence
@@ -308,13 +334,6 @@ static int can_probe(unsigned long paddr)
 * Also, these movl and addl are used for showing expected
 * type. So those must not be touched.
 */
-   __addr = recover_probed_instruction(buf, addr);
-   if (!__addr)
-   return 0;
-
-   if (insn_decode_kernel(&insn, (void *)__addr) < 0)
-   return 0;
-
if (insn.opcode.value == 0xBA)
offset = 12;
else if (insn.opcode.value == 0x3)
-- 
2.43.0




[RFC PATCH 2/2] x86/kprobes: boost more instructions from grp2/3/4/5

2024-01-26 Thread Jinghao Jia
With the instruction decoder, we are now able to decode and recognize
instructions with opcode extensions. There are more instructions in
these groups that can be boosted:

Group 2: ROL, ROR, RCL, RCR, SHL/SAL, SHR, SAR
Group 3: TEST, NOT, NEG, MUL, IMUL, DIV, IDIV
Group 4: INC, DEC (byte operation)
Group 5: INC, DEC (word/doubleword/quadword operation)

These instructions are not boosted previously because there are reserved
opcodes within the groups, e.g., group 2 with ModR/M.nnn == 110 is
unmapped. As a result, kprobes attached to them requires two int3 traps
as being non-boostable also prevents jump-optimization.

Some simple tests on QEMU show that after boosting and jump-optimization
a single kprobe on these instructions with an empty pre-handler runs 10x
faster (~1000 cycles vs. ~100 cycles).

Since these instructions are mostly ALU operations and do not touch
special registers like RIP, let's boost them so that we get the
performance benefit.

Signed-off-by: Jinghao Jia 
---
 arch/x86/kernel/kprobes/core.c | 21 +++--
 1 file changed, 15 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
index 792b38d22126..f847bd9cc91b 100644
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -169,22 +169,31 @@ int can_boost(struct insn *insn, void *addr)
case 0x62:  /* bound */
case 0x70 ... 0x7f: /* Conditional jumps */
case 0x9a:  /* Call far */
-   case 0xc0 ... 0xc1: /* Grp2 */
case 0xcc ... 0xce: /* software exceptions */
-   case 0xd0 ... 0xd3: /* Grp2 */
case 0xd6:  /* (UD) */
case 0xd8 ... 0xdf: /* ESC */
case 0xe0 ... 0xe3: /* LOOP*, JCXZ */
case 0xe8 ... 0xe9: /* near Call, JMP */
case 0xeb:  /* Short JMP */
case 0xf0 ... 0xf4: /* LOCK/REP, HLT */
-   case 0xf6 ... 0xf7: /* Grp3 */
-   case 0xfe:  /* Grp4 */
/* ... are not boostable */
return 0;
+   case 0xc0 ... 0xc1: /* Grp2 */
+   case 0xd0 ... 0xd3: /* Grp2 */
+   /* ModR/M nnn == 110 is reserved */
+   return X86_MODRM_REG(insn->modrm.bytes[0]) != 6;
+   case 0xf6 ... 0xf7: /* Grp3 */
+   /* ModR/M nnn == 001 is reserved */
+   return X86_MODRM_REG(insn->modrm.bytes[0]) != 1;
+   case 0xfe:  /* Grp4 */
+   /* Only inc and dec are boostable */
+   return X86_MODRM_REG(insn->modrm.bytes[0]) == 0 ||
+  X86_MODRM_REG(insn->modrm.bytes[0]) == 1;
case 0xff:  /* Grp5 */
-   /* Only indirect jmp is boostable */
-   return X86_MODRM_REG(insn->modrm.bytes[0]) == 4;
+   /* Only inc, dec, and indirect jmp are boostable */
+   return X86_MODRM_REG(insn->modrm.bytes[0]) == 0 ||
+  X86_MODRM_REG(insn->modrm.bytes[0]) == 1 ||
+  X86_MODRM_REG(insn->modrm.bytes[0]) == 4;
default:
return 1;
}
-- 
2.43.0




[RFC PATCH 0/2] x86/kprobes: add exception opcode detector and boost more opcodes

2024-01-26 Thread Jinghao Jia
Hi everyone,

This patch set makes the following 2 changes:

- It adds an exception opcode detector to prevent kprobing on INTs and UDs.
  These opcodes serves special purposes in the kernel and kprobing them
  will also cause the stack trace to be polluted by the copy buffer
  address. This is suggested by Masami.

- At the same time, this patch set also boosts more opcodes from the group
  2/3/4/5. The newly boosted opcodes are all arithmetic instructions with
  semantics that are easy to reason about, and therefore, they are able to
  be boosted and executed out-of-line. These instructions were not boosted
  previously because they use opcode extensions that are not handled by the
  kernel. But now with the instruction decoder they can be easily handled.
  Boosting (and further jump optimizing) these instructions leads to a 10x
  performance gain for a single probe on QEMU.

Jinghao Jia (2):
  x86/kprobes: Prohibit kprobing on INT and UD
  x86/kprobes: boost more instructions from grp2/3/4/5

 arch/x86/kernel/kprobes/core.c | 54 ++
 1 file changed, 41 insertions(+), 13 deletions(-)

--
2.43.0




Re: [RFC PATCH 1/2] x86/kprobes: Prohibit kprobing on INT and UD

2024-01-28 Thread Jinghao Jia


On 1/27/24 13:47, Xin Li wrote:
> On 1/26/2024 8:41 PM, Jinghao Jia wrote:
>> Both INTs (INT n, INT1, INT3, INTO) and UDs (UD0, UD1, UD2) serve
>> special purposes in the kernel, e.g., INT3 is used by KGDB and UD2 is
>> involved in LLVM-KCFI instrumentation. At the same time, attaching
>> kprobes on these instructions (particularly UDs) will pollute the stack
>> trace dumped in the kernel ring buffer, since the exception is triggered
>> in the copy buffer rather than the original location.
>>
>> Check for INTs and UDs in can_probe and reject any kprobes trying to
>> attach to these instructions.
>>
>> Suggested-by: Masami Hiramatsu (Google) 
>> Signed-off-by: Jinghao Jia 
>> ---
>>   arch/x86/kernel/kprobes/core.c | 33 ++---
>>   1 file changed, 26 insertions(+), 7 deletions(-)
>>
>> diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
>> index e8babebad7b8..792b38d22126 100644
>> --- a/arch/x86/kernel/kprobes/core.c
>> +++ b/arch/x86/kernel/kprobes/core.c
>> @@ -252,6 +252,22 @@ unsigned long 
>> recover_probed_instruction(kprobe_opcode_t *buf, unsigned long add
>>   return __recover_probed_insn(buf, addr);
>>   }
>>   +static inline int is_exception_insn(struct insn *insn)
> 
> s/int/bool
> 

Oh yes, the return type should be bool. Thanks for pointing out!

--Jinghao

>> +{
>> +    if (insn->opcode.bytes[0] == 0x0f) {
>> +    /* UD0 / UD1 / UD2 */
>> +    return insn->opcode.bytes[1] == 0xff ||
>> +   insn->opcode.bytes[1] == 0xb9 ||
>> +   insn->opcode.bytes[1] == 0x0b;
>> +    } else {
>> +    /* INT3 / INT n / INTO / INT1 */
>> +    return insn->opcode.bytes[0] == 0xcc ||
>> +   insn->opcode.bytes[0] == 0xcd ||
>> +   insn->opcode.bytes[0] == 0xce ||
>> +   insn->opcode.bytes[0] == 0xf1;
>> +    }
>> +}
>> +
>>   /* Check if paddr is at an instruction boundary */
>>   static int can_probe(unsigned long paddr)
>>   {
>> @@ -294,6 +310,16 @@ static int can_probe(unsigned long paddr)
>>   #endif
>>   addr += insn.length;
>>   }
>> +    __addr = recover_probed_instruction(buf, addr);
>> +    if (!__addr)
>> +    return 0;
>> +
>> +    if (insn_decode_kernel(&insn, (void *)__addr) < 0)
>> +    return 0;
>> +
>> +    if (is_exception_insn(&insn))
>> +    return 0;
>> +
>>   if (IS_ENABLED(CONFIG_CFI_CLANG)) {
>>   /*
>>    * The compiler generates the following instruction sequence
>> @@ -308,13 +334,6 @@ static int can_probe(unsigned long paddr)
>>    * Also, these movl and addl are used for showing expected
>>    * type. So those must not be touched.
>>    */
>> -    __addr = recover_probed_instruction(buf, addr);
>> -    if (!__addr)
>> -    return 0;
>> -
>> -    if (insn_decode_kernel(&insn, (void *)__addr) < 0)
>> -    return 0;
>> -
>>   if (insn.opcode.value == 0xBA)
>>   offset = 12;
>>   else if (insn.opcode.value == 0x3)
> 


OpenPGP_signature.asc
Description: OpenPGP digital signature


Re: [RFC PATCH 1/2] x86/kprobes: Prohibit kprobing on INT and UD

2024-01-28 Thread Jinghao Jia


On 1/27/24 19:19, Masami Hiramatsu (Google) wrote:
> On Fri, 26 Jan 2024 22:41:23 -0600
> Jinghao Jia  wrote:
> 
>> Both INTs (INT n, INT1, INT3, INTO) and UDs (UD0, UD1, UD2) serve
>> special purposes in the kernel, e.g., INT3 is used by KGDB and UD2 is
>> involved in LLVM-KCFI instrumentation. At the same time, attaching
>> kprobes on these instructions (particularly UDs) will pollute the stack
>> trace dumped in the kernel ring buffer, since the exception is triggered
>> in the copy buffer rather than the original location.
>>
>> Check for INTs and UDs in can_probe and reject any kprobes trying to
>> attach to these instructions.
>>
> 
> Thanks for implement this check!
> 

You are very welcome :)

> 
>> Suggested-by: Masami Hiramatsu (Google) 
>> Signed-off-by: Jinghao Jia 
>> ---
>>  arch/x86/kernel/kprobes/core.c | 33 ++---
>>  1 file changed, 26 insertions(+), 7 deletions(-)
>>
>> diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
>> index e8babebad7b8..792b38d22126 100644
>> --- a/arch/x86/kernel/kprobes/core.c
>> +++ b/arch/x86/kernel/kprobes/core.c
>> @@ -252,6 +252,22 @@ unsigned long 
>> recover_probed_instruction(kprobe_opcode_t *buf, unsigned long add
>>  return __recover_probed_insn(buf, addr);
>>  }
>>  
>> +static inline int is_exception_insn(struct insn *insn)
>> +{
>> +if (insn->opcode.bytes[0] == 0x0f) {
>> +/* UD0 / UD1 / UD2 */
>> +return insn->opcode.bytes[1] == 0xff ||
>> +   insn->opcode.bytes[1] == 0xb9 ||
>> +   insn->opcode.bytes[1] == 0x0b;
>> +} else {
> 
> If "else" block just return, you don't need this "else".
> 
> bool func()
> {
>   if (cond)
>   return ...
> 
>   return ...
> }
> 
> Is preferrable because this puts "return val" always at the end of non-void
> function.
> 

I will fix this in the v2.

>> +/* INT3 / INT n / INTO / INT1 */
>> +return insn->opcode.bytes[0] == 0xcc ||
>> +   insn->opcode.bytes[0] == 0xcd ||
>> +   insn->opcode.bytes[0] == 0xce ||
>> +   insn->opcode.bytes[0] == 0xf1;
>> +}
>> +}
>> +
>>  /* Check if paddr is at an instruction boundary */
>>  static int can_probe(unsigned long paddr)
>>  {
>> @@ -294,6 +310,16 @@ static int can_probe(unsigned long paddr)
>>  #endif
>>  addr += insn.length;
>>  }
>> +__addr = recover_probed_instruction(buf, addr);
>> +if (!__addr)
>> +return 0;
>> +
>> +if (insn_decode_kernel(&insn, (void *)__addr) < 0)
>> +return 0;
>> +
>> +if (is_exception_insn(&insn))
>> +return 0;
>> +
> 
> Please don't put this outside of decoding loop. You should put these in
> the loop which decodes the instruction from the beginning of the function.
> Since the x86 instrcution is variable length, can_probe() needs to check
> whether that the address is instruction boundary and decodable.
> 
> Thank you,

If my understanding is correct then this is trying to decode the kprobe
target instruction, given that it is after the main decoding loop.  Here I
hoisted the decoding logic out of the if(IS_ENABLED(CONFIG_CFI_CLANG))
block so that we do not need to decode the same instruction twice.  I left
the main decoding loop unchanged so it is still decoding the function from
the start and should handle instruction boundaries. Are there any caveats
that I missed?

--Jinghao

> 
>>  if (IS_ENABLED(CONFIG_CFI_CLANG)) {
>>  /*
>>   * The compiler generates the following instruction sequence
>> @@ -308,13 +334,6 @@ static int can_probe(unsigned long paddr)
>>   * Also, these movl and addl are used for showing expected
>>   * type. So those must not be touched.
>>   */
>> -__addr = recover_probed_instruction(buf, addr);
>> -if (!__addr)
>> -return 0;
>> -
>> -if (insn_decode_kernel(&insn, (void *)__addr) < 0)
>> -return 0;
>> -
>>  if (insn.opcode.value == 0xBA)
>>  offset = 12;
>>  else if (insn.opcode.value == 0x3)
>> -- 
>> 2.43.0
>>
> 
> 


OpenPGP_signature.asc
Description: OpenPGP digital signature


Re: [RFC PATCH 2/2] x86/kprobes: boost more instructions from grp2/3/4/5

2024-01-28 Thread Jinghao Jia


On 1/27/24 20:22, Masami Hiramatsu (Google) wrote:
> On Fri, 26 Jan 2024 22:41:24 -0600
> Jinghao Jia  wrote:
> 
>> With the instruction decoder, we are now able to decode and recognize
>> instructions with opcode extensions. There are more instructions in
>> these groups that can be boosted:
>>
>> Group 2: ROL, ROR, RCL, RCR, SHL/SAL, SHR, SAR
>> Group 3: TEST, NOT, NEG, MUL, IMUL, DIV, IDIV
>> Group 4: INC, DEC (byte operation)
>> Group 5: INC, DEC (word/doubleword/quadword operation)
>>
>> These instructions are not boosted previously because there are reserved
>> opcodes within the groups, e.g., group 2 with ModR/M.nnn == 110 is
>> unmapped. As a result, kprobes attached to them requires two int3 traps
>> as being non-boostable also prevents jump-optimization.
>>
>> Some simple tests on QEMU show that after boosting and jump-optimization
>> a single kprobe on these instructions with an empty pre-handler runs 10x
>> faster (~1000 cycles vs. ~100 cycles).
>>
>> Since these instructions are mostly ALU operations and do not touch
>> special registers like RIP, let's boost them so that we get the
>> performance benefit.
>>
> 
> As far as we check the ModR/M byte, I think we can safely run these
> instructions on trampoline buffer without adjusting results (this
> means it can be "boosted").
> I just have a minor comment, but basically this looks good to me.
> 
> Reviewed-by: Masami Hiramatsu (Google) 
> 
>> Signed-off-by: Jinghao Jia 
>> ---
>>  arch/x86/kernel/kprobes/core.c | 21 +++--
>>  1 file changed, 15 insertions(+), 6 deletions(-)
>>
>> diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
>> index 792b38d22126..f847bd9cc91b 100644
>> --- a/arch/x86/kernel/kprobes/core.c
>> +++ b/arch/x86/kernel/kprobes/core.c
>> @@ -169,22 +169,31 @@ int can_boost(struct insn *insn, void *addr)
>>  case 0x62:  /* bound */
>>  case 0x70 ... 0x7f: /* Conditional jumps */
>>  case 0x9a:  /* Call far */
>> -case 0xc0 ... 0xc1: /* Grp2 */
>>  case 0xcc ... 0xce: /* software exceptions */
>> -case 0xd0 ... 0xd3: /* Grp2 */
>>  case 0xd6:  /* (UD) */
>>  case 0xd8 ... 0xdf: /* ESC */
>>  case 0xe0 ... 0xe3: /* LOOP*, JCXZ */
>>  case 0xe8 ... 0xe9: /* near Call, JMP */
>>  case 0xeb:  /* Short JMP */
>>  case 0xf0 ... 0xf4: /* LOCK/REP, HLT */
>> -case 0xf6 ... 0xf7: /* Grp3 */
>> -case 0xfe:  /* Grp4 */
>>  /* ... are not boostable */
>>  return 0;
>> +case 0xc0 ... 0xc1: /* Grp2 */
>> +case 0xd0 ... 0xd3: /* Grp2 */
>> +/* ModR/M nnn == 110 is reserved */
>> +return X86_MODRM_REG(insn->modrm.bytes[0]) != 6;
>> +case 0xf6 ... 0xf7: /* Grp3 */
>> +/* ModR/M nnn == 001 is reserved */
> 
>   /* AMD uses nnn == 001 as TEST, but Intel makes it reserved. */
> 

I will incorporate this into the v2. Since nnn == 001 is still considered
reserved by Intel, we still need to prevent it from being boosted, don't
we?

--Jinghao

>> +return X86_MODRM_REG(insn->modrm.bytes[0]) != 1;
>> +case 0xfe:  /* Grp4 */
>> +/* Only inc and dec are boostable */
>> +return X86_MODRM_REG(insn->modrm.bytes[0]) == 0 ||
>> +   X86_MODRM_REG(insn->modrm.bytes[0]) == 1;
>>  case 0xff:  /* Grp5 */
>> -/* Only indirect jmp is boostable */
>> -return X86_MODRM_REG(insn->modrm.bytes[0]) == 4;
>> +/* Only inc, dec, and indirect jmp are boostable */
>> +return X86_MODRM_REG(insn->modrm.bytes[0]) == 0 ||
>> +   X86_MODRM_REG(insn->modrm.bytes[0]) == 1 ||
>> +   X86_MODRM_REG(insn->modrm.bytes[0]) == 4;
>>  default:
>>  return 1;
>>  }
>> -- 
>> 2.43.0
>>
> 
> Thamnk you,
> 


OpenPGP_signature.asc
Description: OpenPGP digital signature


Re: [RFC PATCH 1/2] x86/kprobes: Prohibit kprobing on INT and UD

2024-01-29 Thread Jinghao Jia
On 1/29/24 19:44, Masami Hiramatsu (Google) wrote:
> On Sun, 28 Jan 2024 15:25:59 -0600
> Jinghao Jia  wrote:
> 
>>>>  /* Check if paddr is at an instruction boundary */
>>>>  static int can_probe(unsigned long paddr)
>>>>  {
>>>> @@ -294,6 +310,16 @@ static int can_probe(unsigned long paddr)
>>>>  #endif
>>>>addr += insn.length;
>>>>}
>>>> +  __addr = recover_probed_instruction(buf, addr);
>>>> +  if (!__addr)
>>>> +  return 0;
>>>> +
>>>> +  if (insn_decode_kernel(&insn, (void *)__addr) < 0)
>>>> +  return 0;
>>>> +
>>>> +  if (is_exception_insn(&insn))
>>>> +  return 0;
>>>> +
>>>
>>> Please don't put this outside of decoding loop. You should put these in
>>> the loop which decodes the instruction from the beginning of the function.
>>> Since the x86 instrcution is variable length, can_probe() needs to check
>>> whether that the address is instruction boundary and decodable.
>>>
>>> Thank you,
>>
>> If my understanding is correct then this is trying to decode the kprobe
>> target instruction, given that it is after the main decoding loop.  Here I
>> hoisted the decoding logic out of the if(IS_ENABLED(CONFIG_CFI_CLANG))
>> block so that we do not need to decode the same instruction twice.  I left
>> the main decoding loop unchanged so it is still decoding the function from
>> the start and should handle instruction boundaries. Are there any caveats
>> that I missed?
> 
> Ah, sorry I misread the patch. You're correct!
> This is a good place to do that.
> 
> But hmm, I think we should add another patch to check the addr == paddr
> soon after the loop so that we will avoid decoding.
> 
> Thank you,
> 

Yes, that makes sense to me. At the same time, I'm also thinking about
changing the return type of can_probe() to bool, since we are just using
int as bool in this context.

--Jinghao

>>
>> --Jinghao
>>
>>>
>>>>if (IS_ENABLED(CONFIG_CFI_CLANG)) {
>>>>/*
>>>> * The compiler generates the following instruction sequence
>>>> @@ -308,13 +334,6 @@ static int can_probe(unsigned long paddr)
>>>> * Also, these movl and addl are used for showing expected
>>>> * type. So those must not be touched.
>>>> */
>>>> -  __addr = recover_probed_instruction(buf, addr);
>>>> -  if (!__addr)
>>>> -  return 0;
>>>> -
>>>> -  if (insn_decode_kernel(&insn, (void *)__addr) < 0)
>>>> -  return 0;
>>>> -
>>>>if (insn.opcode.value == 0xBA)
>>>>offset = 12;
>>>>else if (insn.opcode.value == 0x3)
>>>> -- 
>>>> 2.43.0
>>>>
>>>
>>>
> 
> 


OpenPGP_signature.asc
Description: OpenPGP digital signature


[PATCH v2 0/3] x86/kprobes: add exception opcode detector and boost more opcodes

2024-02-03 Thread Jinghao Jia
Hi everyone,

This patch set makes the following 3 changes:

- It refactors the can_probe and can_boost function to make them return
  bool instead of int. Both functions are just using int as bool so let's
  make them return a real boolean value.

- It adds an exception opcode detector to prevent kprobing on INTs and UDs.
  These opcodes serves special purposes in the kernel and kprobing them
  will also cause the stack trace to be polluted by the copy buffer
  address. This is suggested by Masami.

- At the same time, this patch set also boosts more opcodes from the group
  2/3/4/5. The newly boosted opcodes are all arithmetic instructions with
  semantics that are easy to reason about, and therefore, they are able to
  be boosted and executed out-of-line. These instructions were not boosted
  previously because they use opcode extensions that are not handled by the
  kernel. But now with the instruction decoder they can be easily handled.
  Boosting (and further jump optimizing) these instructions leads to a 10x
  performance gain for a single probe on QEMU.

Changelog:
---
v1 -> v2
v1: 
https://lore.kernel.org/linux-trace-kernel/20240127044124.57594-1-jingh...@illinois.edu/

- Address feedback from Xin:
  - Change return type of is_exception_insn from int to bool.

- Address feedback from Masami:
  - Improve code style in is_exception_insn.
  - Move instruction boundary check of the target address (addr == paddr)
right after the decoding loop to avoid decoding if the target address
is not a valid instruction boundary.
  - Document instruction encoding differences between AMD and Intel for
instruction group 2 and 3 in can_boost.

- Add an extra patch to change the return type of can_probe and can_boost
  from int to bool based on v1 discussion.

- Improve code comments in general.

Jinghao Jia (3):
  x86/kprobes: Refactor can_{probe,boost} return type to bool
  x86/kprobes: Prohibit kprobing on INT and UD
  x86/kprobes: Boost more instructions from grp2/3/4/5

 arch/x86/kernel/kprobes/common.h |  2 +-
 arch/x86/kernel/kprobes/core.c   | 98 ++--
 2 files changed, 69 insertions(+), 31 deletions(-)

--
2.43.0




[PATCH v2 1/3] x86/kprobes: Refactor can_{probe,boost} return type to bool

2024-02-03 Thread Jinghao Jia
Both can_probe and can_boost have int return type but are using int as
boolean in their context.

Refactor both functions to make them actually return boolean.

Signed-off-by: Jinghao Jia 
---
 arch/x86/kernel/kprobes/common.h |  2 +-
 arch/x86/kernel/kprobes/core.c   | 33 +++-
 2 files changed, 16 insertions(+), 19 deletions(-)

diff --git a/arch/x86/kernel/kprobes/common.h b/arch/x86/kernel/kprobes/common.h
index c993521d4933..e772276f5aa9 100644
--- a/arch/x86/kernel/kprobes/common.h
+++ b/arch/x86/kernel/kprobes/common.h
@@ -78,7 +78,7 @@
 #endif
 
 /* Ensure if the instruction can be boostable */
-extern int can_boost(struct insn *insn, void *orig_addr);
+extern bool can_boost(struct insn *insn, void *orig_addr);
 /* Recover instruction if given address is probed */
 extern unsigned long recover_probed_instruction(kprobe_opcode_t *buf,
 unsigned long addr);
diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
index e8babebad7b8..644d416441fb 100644
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -137,14 +137,14 @@ NOKPROBE_SYMBOL(synthesize_relcall);
  * Returns non-zero if INSN is boostable.
  * RIP relative instructions are adjusted at copying time in 64 bits mode
  */
-int can_boost(struct insn *insn, void *addr)
+bool can_boost(struct insn *insn, void *addr)
 {
kprobe_opcode_t opcode;
insn_byte_t prefix;
int i;
 
if (search_exception_tables((unsigned long)addr))
-   return 0;   /* Page fault may occur on this address. */
+   return false;   /* Page fault may occur on this address. */
 
/* 2nd-byte opcode */
if (insn->opcode.nbytes == 2)
@@ -152,7 +152,7 @@ int can_boost(struct insn *insn, void *addr)
(unsigned long *)twobyte_is_boostable);
 
if (insn->opcode.nbytes != 1)
-   return 0;
+   return false;
 
for_each_insn_prefix(insn, i, prefix) {
insn_attr_t attr;
@@ -160,7 +160,7 @@ int can_boost(struct insn *insn, void *addr)
attr = inat_get_opcode_attribute(prefix);
/* Can't boost Address-size override prefix and CS override 
prefix */
if (prefix == 0x2e || inat_is_address_size_prefix(attr))
-   return 0;
+   return false;
}
 
opcode = insn->opcode.bytes[0];
@@ -181,12 +181,12 @@ int can_boost(struct insn *insn, void *addr)
case 0xf6 ... 0xf7: /* Grp3 */
case 0xfe:  /* Grp4 */
/* ... are not boostable */
-   return 0;
+   return false;
case 0xff:  /* Grp5 */
/* Only indirect jmp is boostable */
return X86_MODRM_REG(insn->modrm.bytes[0]) == 4;
default:
-   return 1;
+   return true;
}
 }
 
@@ -253,20 +253,18 @@ unsigned long recover_probed_instruction(kprobe_opcode_t 
*buf, unsigned long add
 }
 
 /* Check if paddr is at an instruction boundary */
-static int can_probe(unsigned long paddr)
+static bool can_probe(unsigned long paddr)
 {
unsigned long addr, __addr, offset = 0;
struct insn insn;
kprobe_opcode_t buf[MAX_INSN_SIZE];
 
if (!kallsyms_lookup_size_offset(paddr, NULL, &offset))
-   return 0;
+   return false;
 
/* Decode instructions */
addr = paddr - offset;
while (addr < paddr) {
-   int ret;
-
/*
 * Check if the instruction has been modified by another
 * kprobe, in which case we replace the breakpoint by the
@@ -277,11 +275,10 @@ static int can_probe(unsigned long paddr)
 */
__addr = recover_probed_instruction(buf, addr);
if (!__addr)
-   return 0;
+   return false;
 
-   ret = insn_decode_kernel(&insn, (void *)__addr);
-   if (ret < 0)
-   return 0;
+   if (insn_decode_kernel(&insn, (void *)__addr) < 0)
+   return false;
 
 #ifdef CONFIG_KGDB
/*
@@ -290,7 +287,7 @@ static int can_probe(unsigned long paddr)
 */
if (insn.opcode.bytes[0] == INT3_INSN_OPCODE &&
kgdb_has_hit_break(addr))
-   return 0;
+   return false;
 #endif
addr += insn.length;
}
@@ -310,10 +307,10 @@ static int can_probe(unsigned long paddr)
 */
__addr = recover_probed_instruction(buf, addr);
if (!__addr)
-   return 0;
+   return false;
 
if (insn_decode_kernel(&insn, (vo

[PATCH v2 3/3] x86/kprobes: Boost more instructions from grp2/3/4/5

2024-02-03 Thread Jinghao Jia
With the instruction decoder, we are now able to decode and recognize
instructions with opcode extensions. There are more instructions in
these groups that can be boosted:

Group 2: ROL, ROR, RCL, RCR, SHL/SAL, SHR, SAR
Group 3: TEST, NOT, NEG, MUL, IMUL, DIV, IDIV
Group 4: INC, DEC (byte operation)
Group 5: INC, DEC (word/doubleword/quadword operation)

These instructions are not boosted previously because there are reserved
opcodes within the groups, e.g., group 2 with ModR/M.nnn == 110 is
unmapped. As a result, kprobes attached to them requires two int3 traps
as being non-boostable also prevents jump-optimization.

Some simple tests on QEMU show that after boosting and jump-optimization
a single kprobe on these instructions with an empty pre-handler runs 10x
faster (~1000 cycles vs. ~100 cycles).

Since these instructions are mostly ALU operations and do not touch
special registers like RIP, let's boost them so that we get the
performance benefit.

Signed-off-by: Jinghao Jia 
---
 arch/x86/kernel/kprobes/core.c | 23 +--
 1 file changed, 17 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
index 7a08d6a486c8..530f6d4b34f4 100644
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -169,22 +169,33 @@ bool can_boost(struct insn *insn, void *addr)
case 0x62:  /* bound */
case 0x70 ... 0x7f: /* Conditional jumps */
case 0x9a:  /* Call far */
-   case 0xc0 ... 0xc1: /* Grp2 */
case 0xcc ... 0xce: /* software exceptions */
-   case 0xd0 ... 0xd3: /* Grp2 */
case 0xd6:  /* (UD) */
case 0xd8 ... 0xdf: /* ESC */
case 0xe0 ... 0xe3: /* LOOP*, JCXZ */
case 0xe8 ... 0xe9: /* near Call, JMP */
case 0xeb:  /* Short JMP */
case 0xf0 ... 0xf4: /* LOCK/REP, HLT */
-   case 0xf6 ... 0xf7: /* Grp3 */
-   case 0xfe:  /* Grp4 */
/* ... are not boostable */
return false;
+   case 0xc0 ... 0xc1: /* Grp2 */
+   case 0xd0 ... 0xd3: /* Grp2 */
+   /*
+* AMD uses nnn == 110 as SHL/SAL, but Intel makes it reserved.
+*/
+   return X86_MODRM_REG(insn->modrm.bytes[0]) != 0b110;
+   case 0xf6 ... 0xf7: /* Grp3 */
+   /* AMD uses nnn == 001 as TEST, but Intel makes it reserved. */
+   return X86_MODRM_REG(insn->modrm.bytes[0]) != 0b001;
+   case 0xfe:  /* Grp4 */
+   /* Only INC and DEC are boostable */
+   return X86_MODRM_REG(insn->modrm.bytes[0]) == 0b000 ||
+  X86_MODRM_REG(insn->modrm.bytes[0]) == 0b001;
case 0xff:  /* Grp5 */
-   /* Only indirect jmp is boostable */
-   return X86_MODRM_REG(insn->modrm.bytes[0]) == 4;
+   /* Only INC, DEC, and indirect JMP are boostable */
+   return X86_MODRM_REG(insn->modrm.bytes[0]) == 0b000 ||
+  X86_MODRM_REG(insn->modrm.bytes[0]) == 0b001 ||
+  X86_MODRM_REG(insn->modrm.bytes[0]) == 0b100;
default:
return true;
}
-- 
2.43.0




[PATCH v2 2/3] x86/kprobes: Prohibit kprobing on INT and UD

2024-02-03 Thread Jinghao Jia
Both INT (INT n, INT1, INT3, INTO) and UD (UD0, UD1, UD2) serve special
purposes in the kernel, e.g., INT3 is used by KGDB and UD2 is involved
in LLVM-KCFI instrumentation. At the same time, attaching kprobes on
these instructions (particularly UD) will pollute the stack trace dumped
in the kernel ring buffer, since the exception is triggered in the copy
buffer rather than the original location.

Check for INT and UD in can_probe and reject any kprobes trying to
attach to these instructions.

Suggested-by: Masami Hiramatsu (Google) 
Signed-off-by: Jinghao Jia 
---
 arch/x86/kernel/kprobes/core.c | 48 +++---
 1 file changed, 39 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
index 644d416441fb..7a08d6a486c8 100644
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -252,7 +252,28 @@ unsigned long recover_probed_instruction(kprobe_opcode_t 
*buf, unsigned long add
return __recover_probed_insn(buf, addr);
 }
 
-/* Check if paddr is at an instruction boundary */
+/* Check if insn is INT or UD */
+static inline bool is_exception_insn(struct insn *insn)
+{
+   /* UD uses 0f escape */
+   if (insn->opcode.bytes[0] == 0x0f) {
+   /* UD0 / UD1 / UD2 */
+   return insn->opcode.bytes[1] == 0xff ||
+  insn->opcode.bytes[1] == 0xb9 ||
+  insn->opcode.bytes[1] == 0x0b;
+   }
+
+   /* INT3 / INT n / INTO / INT1 */
+   return insn->opcode.bytes[0] == 0xcc ||
+  insn->opcode.bytes[0] == 0xcd ||
+  insn->opcode.bytes[0] == 0xce ||
+  insn->opcode.bytes[0] == 0xf1;
+}
+
+/*
+ * Check if paddr is at an instruction boundary and that instruction can
+ * be probed
+ */
 static bool can_probe(unsigned long paddr)
 {
unsigned long addr, __addr, offset = 0;
@@ -291,6 +312,22 @@ static bool can_probe(unsigned long paddr)
 #endif
addr += insn.length;
}
+
+   /* Check if paddr is at an instruction boundary */
+   if (addr != paddr)
+   return false;
+
+   __addr = recover_probed_instruction(buf, addr);
+   if (!__addr)
+   return false;
+
+   if (insn_decode_kernel(&insn, (void *)__addr) < 0)
+   return false;
+
+   /* INT and UD are special and should not be kprobed */
+   if (is_exception_insn(&insn))
+   return false;
+
if (IS_ENABLED(CONFIG_CFI_CLANG)) {
/*
 * The compiler generates the following instruction sequence
@@ -305,13 +342,6 @@ static bool can_probe(unsigned long paddr)
 * Also, these movl and addl are used for showing expected
 * type. So those must not be touched.
 */
-   __addr = recover_probed_instruction(buf, addr);
-   if (!__addr)
-   return false;
-
-   if (insn_decode_kernel(&insn, (void *)__addr) < 0)
-   return false;
-
if (insn.opcode.value == 0xBA)
offset = 12;
else if (insn.opcode.value == 0x3)
@@ -325,7 +355,7 @@ static bool can_probe(unsigned long paddr)
}
 
 out:
-   return (addr == paddr);
+   return true;
 }
 
 /* If x86 supports IBT (ENDBR) it must be skipped. */
-- 
2.43.0




Re: [PATCH v2 3/3] x86/kprobes: Boost more instructions from grp2/3/4/5

2024-02-04 Thread Jinghao Jia
On 2/4/24 06:09, Masami Hiramatsu (Google) wrote:
> On Sat,  3 Feb 2024 21:13:00 -0600
> Jinghao Jia  wrote:
> 
>> With the instruction decoder, we are now able to decode and recognize
>> instructions with opcode extensions. There are more instructions in
>> these groups that can be boosted:
>>
>> Group 2: ROL, ROR, RCL, RCR, SHL/SAL, SHR, SAR
>> Group 3: TEST, NOT, NEG, MUL, IMUL, DIV, IDIV
>> Group 4: INC, DEC (byte operation)
>> Group 5: INC, DEC (word/doubleword/quadword operation)
>>
>> These instructions are not boosted previously because there are reserved
>> opcodes within the groups, e.g., group 2 with ModR/M.nnn == 110 is
>> unmapped. As a result, kprobes attached to them requires two int3 traps
>> as being non-boostable also prevents jump-optimization.
>>
>> Some simple tests on QEMU show that after boosting and jump-optimization
>> a single kprobe on these instructions with an empty pre-handler runs 10x
>> faster (~1000 cycles vs. ~100 cycles).
>>
>> Since these instructions are mostly ALU operations and do not touch
>> special registers like RIP, let's boost them so that we get the
>> performance benefit.
>>
> 
> This looks good to me. And can you check how many instructions in the
> vmlinux will be covered by this change typically?
> 

I collected the stats from the LLVM CodeGen backend on kernel version 6.7.3
using Gentoo's dist-kernel config (with a mod2yesconfig to make modules
builtin) and here are the number of Grp 2/3/4/5 instructions that are newly
covered by this patch:

Kernel total # of insns:28552017(from objdump)
Grp2 insns: 286249  (from LLVM)
Grp3 insns: 286556  (from LLVM)
Grp4 insns: 5832(from LLVM)
Grp5 insns: 146314      (from LLVM)

Note that using LLVM means we miss the stats from inline assembly and
assembly source files.

--Jinghao

> Thank you,
> 
>> Signed-off-by: Jinghao Jia 
>> ---
>>  arch/x86/kernel/kprobes/core.c | 23 +--
>>  1 file changed, 17 insertions(+), 6 deletions(-)
>>
>> diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
>> index 7a08d6a486c8..530f6d4b34f4 100644
>> --- a/arch/x86/kernel/kprobes/core.c
>> +++ b/arch/x86/kernel/kprobes/core.c
>> @@ -169,22 +169,33 @@ bool can_boost(struct insn *insn, void *addr)
>>  case 0x62:  /* bound */
>>  case 0x70 ... 0x7f: /* Conditional jumps */
>>  case 0x9a:  /* Call far */
>> -case 0xc0 ... 0xc1: /* Grp2 */
>>  case 0xcc ... 0xce: /* software exceptions */
>> -case 0xd0 ... 0xd3: /* Grp2 */
>>  case 0xd6:  /* (UD) */
>>  case 0xd8 ... 0xdf: /* ESC */
>>  case 0xe0 ... 0xe3: /* LOOP*, JCXZ */
>>  case 0xe8 ... 0xe9: /* near Call, JMP */
>>  case 0xeb:  /* Short JMP */
>>  case 0xf0 ... 0xf4: /* LOCK/REP, HLT */
>> -case 0xf6 ... 0xf7: /* Grp3 */
>> -case 0xfe:  /* Grp4 */
>>  /* ... are not boostable */
>>  return false;
>> +case 0xc0 ... 0xc1: /* Grp2 */
>> +case 0xd0 ... 0xd3: /* Grp2 */
>> +/*
>> + * AMD uses nnn == 110 as SHL/SAL, but Intel makes it reserved.
>> + */
>> +return X86_MODRM_REG(insn->modrm.bytes[0]) != 0b110;
>> +case 0xf6 ... 0xf7: /* Grp3 */
>> +/* AMD uses nnn == 001 as TEST, but Intel makes it reserved. */
>> +return X86_MODRM_REG(insn->modrm.bytes[0]) != 0b001;
>> +case 0xfe:  /* Grp4 */
>> +/* Only INC and DEC are boostable */
>> +return X86_MODRM_REG(insn->modrm.bytes[0]) == 0b000 ||
>> +   X86_MODRM_REG(insn->modrm.bytes[0]) == 0b001;
>>  case 0xff:  /* Grp5 */
>> -/* Only indirect jmp is boostable */
>> -return X86_MODRM_REG(insn->modrm.bytes[0]) == 4;
>> +/* Only INC, DEC, and indirect JMP are boostable */
>> +return X86_MODRM_REG(insn->modrm.bytes[0]) == 0b000 ||
>> +   X86_MODRM_REG(insn->modrm.bytes[0]) == 0b001 ||
>> +   X86_MODRM_REG(insn->modrm.bytes[0]) == 0b100;
>>  default:
>>  return true;
>>  }
>> -- 
>> 2.43.0
>>
> 
> 


OpenPGP_signature.asc
Description: OpenPGP digital signature


Re: [PATCH v2] kprobes/x86: Use copy_from_kernel_nofault() to read from unsafe address

2024-03-14 Thread Jinghao Jia
On 3/14/24 10:17, Masami Hiramatsu (Google) wrote:
> From: Masami Hiramatsu (Google) 
> 
> Read from an unsafe address with copy_from_kernel_nofault() in
> arch_adjust_kprobe_addr() because this function is used before checking
> the address is in text or not. Syzcaller bot found a bug and reported
> the case if user specifies inaccessible data area,
> arch_adjust_kprobe_addr() will cause a kernel panic.

IMHO there is a check on the address in kallsyms_lookup_size_offset to see
if it is a kernel text address before arch_adjust_kprobe_addr is invoked.

The call chain is:

register_kprobe()
  _kprobe_addr()
kallsyms_lookup_size_offset() <- check on addr is here
arch_adjust_kprobe_addr()

I wonder why this check was not able to capture the problem in this bug
report (I cannot reproduce it locally).

Thanks,
--Jinghao

> 
> Reported-by: Qiang Zhang 
> Closes: 
> https://urldefense.com/v3/__https://lore.kernel.org/all/cakhosas2rof6vqvbw_lg_j3qnku0canzr2qmy4et7r5lo8m...@mail.gmail.com/__;!!DZ3fjg!_C9Dn6-GBlkyS2z34bDUBsEXkTkgWp45MNrd4Rl5I5slz2A3SrurXOxKupsxLMqxC2BMiySCTfB2-5fPhkLP1g$
>  
> Fixes: cc66bb914578 ("x86/ibt,kprobes: Cure sym+0 equals fentry woes")
> Signed-off-by: Masami Hiramatsu (Google) 
> ---
>  Changes in v2:
>   - Fix to return NULL if failed to access the address.
> ---
>  arch/x86/kernel/kprobes/core.c |   11 ++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
> index a0ce46c0a2d8..95e4ebe5d514 100644
> --- a/arch/x86/kernel/kprobes/core.c
> +++ b/arch/x86/kernel/kprobes/core.c
> @@ -335,7 +335,16 @@ static int can_probe(unsigned long paddr)
>  kprobe_opcode_t *arch_adjust_kprobe_addr(unsigned long addr, unsigned long 
> offset,
>bool *on_func_entry)
>  {
> - if (is_endbr(*(u32 *)addr)) {
> + u32 insn;
> +
> + /*
> +  * Since addr is not guaranteed to be safely accessed yet, use
> +  * copy_from_kernel_nofault() to get the instruction.
> +  */
> + if (copy_from_kernel_nofault(&insn, (void *)addr, sizeof(u32)))
> + return NULL;
> +
> + if (is_endbr(insn)) {
>   *on_func_entry = !offset || offset == 4;
>   if (*on_func_entry)
>   offset = 4;
> 


OpenPGP_signature.asc
Description: OpenPGP digital signature


Re: [PATCH v2] kprobes/x86: Use copy_from_kernel_nofault() to read from unsafe address

2024-03-17 Thread Jinghao Jia


On 3/16/24 08:46, Masami Hiramatsu (Google) wrote:
> On Thu, 14 Mar 2024 18:56:35 -0500
> Jinghao Jia  wrote:
> 
>> On 3/14/24 10:17, Masami Hiramatsu (Google) wrote:
>>> From: Masami Hiramatsu (Google) 
>>>
>>> Read from an unsafe address with copy_from_kernel_nofault() in
>>> arch_adjust_kprobe_addr() because this function is used before checking
>>> the address is in text or not. Syzcaller bot found a bug and reported
>>> the case if user specifies inaccessible data area,
>>> arch_adjust_kprobe_addr() will cause a kernel panic.
>>
>> IMHO there is a check on the address in kallsyms_lookup_size_offset to see
>> if it is a kernel text address before arch_adjust_kprobe_addr is invoked.
> 
> Yeah, kallsyms does not ensure the page (especially data) exists.
> 
>>
>> The call chain is:
>>
>> register_kprobe()
>>   _kprobe_addr()
>> kallsyms_lookup_size_offset() <- check on addr is here
>> arch_adjust_kprobe_addr()
>>
>> I wonder why this check was not able to capture the problem in this bug
>> report (I cannot reproduce it locally).
> 
> I could reproduce it locally, it tried to access 'Y' data.
> (I attached my .config) And I ensured that this fixed the problem.
> 
> The reproduce test actually tried to access initdata area
> 
> 82fb5450 d __alt_reloc_selftest_addr
> 82fb5460 d int3_exception_nb.1
> 82fb5478 d tsc_early_khz
> 82fb547c d io_delay_override
> 82fb5480 d fxregs.0
> 82fb5680 d y<--- access this
> 82fb5688 d x
> 82fb56a0 d xsave_cpuid_features
> 82fb56c8 d l1d_flush_mitigation
> 
> `y` is too generic, so check `io_delay_override` which is on the
> same page.
> 
> $ git grep io_delay_override
> arch/x86/kernel/io_delay.c:static int __initdata io_delay_override;
> 
> As you can see, it is marked as `__initdata`, and the initdata has been
> freed before starting /init.
> 
> 
> [2.679161] Freeing unused kernel image (initmem) memory: 2888K
> [2.688731] Write protecting the kernel read-only data: 24576k
> [2.691802] Freeing unused kernel image (rodata/data gap) memory: 1436K
> [2.746994] x86/mm: Checked W+X mappings: passed, no W+X pages found.
> [2.748022] x86/mm: Checking user space page tables
> [2.789520] x86/mm: Checked W+X mappings: passed, no W+X pages found.
> [2.790527] Run /init as init process
> ----
> 
> So this has been caused because accessing freed initdata.

Thanks a lot for the explanation! I have confirmed the bug and tested the
patch with CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT=y (which explicitly marks
the init pages as not-present after boot).

Tested-by: Jinghao Jia 

--Jinghao

> 
> Thank you,
> 
>>
>> Thanks,
>> --Jinghao
>>
>>>
>>> Reported-by: Qiang Zhang 
>>> Closes: 
>>> https://urldefense.com/v3/__https://lore.kernel.org/all/cakhosas2rof6vqvbw_lg_j3qnku0canzr2qmy4et7r5lo8m...@mail.gmail.com/__;!!DZ3fjg!_C9Dn6-GBlkyS2z34bDUBsEXkTkgWp45MNrd4Rl5I5slz2A3SrurXOxKupsxLMqxC2BMiySCTfB2-5fPhkLP1g$
>>>  
>>> Fixes: cc66bb914578 ("x86/ibt,kprobes: Cure sym+0 equals fentry woes")
>>> Signed-off-by: Masami Hiramatsu (Google) 
>>> ---
>>>  Changes in v2:
>>>   - Fix to return NULL if failed to access the address.
>>> ---
>>>  arch/x86/kernel/kprobes/core.c |   11 ++-
>>>  1 file changed, 10 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
>>> index a0ce46c0a2d8..95e4ebe5d514 100644
>>> --- a/arch/x86/kernel/kprobes/core.c
>>> +++ b/arch/x86/kernel/kprobes/core.c
>>> @@ -335,7 +335,16 @@ static int can_probe(unsigned long paddr)
>>>  kprobe_opcode_t *arch_adjust_kprobe_addr(unsigned long addr, unsigned long 
>>> offset,
>>>  bool *on_func_entry)
>>>  {
>>> -   if (is_endbr(*(u32 *)addr)) {
>>> +   u32 insn;
>>> +
>>> +   /*
>>> +* Since addr is not guaranteed to be safely accessed yet, use
>>> +* copy_from_kernel_nofault() to get the instruction.
>>> +*/
>>> +   if (copy_from_kernel_nofault(&insn, (void *)addr, sizeof(u32)))
>>> +   return NULL;
>>> +
>>> +   if (is_endbr(insn)) {
>>> *on_func_entry = !offset || offset == 4;
>>> if (*on_func_entry)
>>> offset = 4;
>>>
> 
> 


OpenPGP_signature.asc
Description: OpenPGP digital signature


[PATCH] ipvs: fix UB due to uninitialized stack access in ip_vs_protocol_init()

2024-11-10 Thread Jinghao Jia
Under certain kernel configurations when building with Clang/LLVM, the
compiler does not generate a return or jump as the terminator
instruction for ip_vs_protocol_init(), triggering the following objtool
warning during build time:

  vmlinux.o: warning: objtool: ip_vs_protocol_init() falls through to next 
function __initstub__kmod_ip_vs_rr__935_123_ip_vs_rr_init6()

At runtime, this either causes an oops when trying to load the ipvs
module or a boot-time panic if ipvs is built-in. This same issue has
been reported by the Intel kernel test robot previously.

Digging deeper into both LLVM and the kernel code reveals this to be a
undefined behavior problem. ip_vs_protocol_init() uses a on-stack buffer
of 64 chars to store the registered protocol names and leaves it
uninitialized after definition. The function calls strnlen() when
concatenating protocol names into the buffer. With CONFIG_FORTIFY_SOURCE
strnlen() performs an extra step to check whether the last byte of the
input char buffer is a null character (commit 3009f891bb9f ("fortify:
Allow strlen() and strnlen() to pass compile-time known lengths")).
This, together with possibly other configurations, cause the following
IR to be generated:

  define hidden i32 @ip_vs_protocol_init() local_unnamed_addr #5 section 
".init.text" align 16 !kcfi_type !29 {
%1 = alloca [64 x i8], align 16
...

  14:   ; preds = %11
%15 = getelementptr inbounds i8, ptr %1, i64 63
%16 = load i8, ptr %15, align 1
%17 = tail call i1 @llvm.is.constant.i8(i8 %16)
%18 = icmp eq i8 %16, 0
%19 = select i1 %17, i1 %18, i1 false
br i1 %19, label %20, label %23

  20:   ; preds = %14
%21 = call i64 @strlen(ptr noundef nonnull dereferenceable(1) %1) #23
...

  23:   ; preds = %14, %11, %20
%24 = call i64 @strnlen(ptr noundef nonnull dereferenceable(1) %1, i64 
noundef 64) #24
...
  }

The above code calculates the address of the last char in the buffer
(value %15) and then loads from it (value %16). Because the buffer is
never initialized, the LLVM GVN pass marks value %16 as undefined:

  %13 = getelementptr inbounds i8, ptr %1, i64 63
  br i1 undef, label %14, label %17

This gives later passes (SCCP, in particular) to more DCE opportunities
by propagating the undef value further, and eventually removes
everything after the load on the uninitialized stack location:

  define hidden i32 @ip_vs_protocol_init() local_unnamed_addr #0 section 
".init.text" align 16 !kcfi_type !11 {
%1 = alloca [64 x i8], align 16
...

  12:   ; preds = %11
%13 = getelementptr inbounds i8, ptr %1, i64 63
unreachable
  }

In this way, the generated native code will just fall through to the
next function, as LLVM does not generate any code for the unreachable IR
instruction and leaves the function without a terminator.

Zero the on-stack buffer to avoid this possible UB.

Reported-by: kernel test robot 
Closes: 
https://lore.kernel.org/oe-kbuild-all/202402100205.pwxiz1zk-...@intel.com/
Co-developed-by: Ruowen Qin 
Signed-off-by: Ruowen Qin 
Signed-off-by: Jinghao Jia 
---
 net/netfilter/ipvs/ip_vs_proto.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/net/netfilter/ipvs/ip_vs_proto.c b/net/netfilter/ipvs/ip_vs_proto.c
index f100da4ba3bc..a9fd1d3fc2cb 100644
--- a/net/netfilter/ipvs/ip_vs_proto.c
+++ b/net/netfilter/ipvs/ip_vs_proto.c
@@ -340,7 +340,7 @@ void __net_exit ip_vs_protocol_net_cleanup(struct 
netns_ipvs *ipvs)
 
 int __init ip_vs_protocol_init(void)
 {
-   char protocols[64];
+   char protocols[64] = { 0 };
 #define REGISTER_PROTOCOL(p)   \
do {\
register_ip_vs_protocol(p); \
@@ -348,8 +348,6 @@ int __init ip_vs_protocol_init(void)
strcat(protocols, (p)->name);   \
} while (0)
 
-   protocols[0] = '\0';
-   protocols[2] = '\0';
 #ifdef CONFIG_IP_VS_PROTO_TCP
REGISTER_PROTOCOL(&ip_vs_protocol_tcp);
 #endif
-- 
2.47.0




Re: [PATCH] ipvs: fix UB due to uninitialized stack access in ip_vs_protocol_init()

2024-11-18 Thread Jinghao Jia
Hi Julian,

Thanks for getting back to us!

On 11/18/24 6:41 AM, Julian Anastasov wrote:
> 
>   Hello,
> 
> On Mon, 11 Nov 2024, Jinghao Jia wrote:
> 
>> Under certain kernel configurations when building with Clang/LLVM, the
>> compiler does not generate a return or jump as the terminator
>> instruction for ip_vs_protocol_init(), triggering the following objtool
>> warning during build time:
>>
>>   vmlinux.o: warning: objtool: ip_vs_protocol_init() falls through to next 
>> function __initstub__kmod_ip_vs_rr__935_123_ip_vs_rr_init6()
>>
>> At runtime, this either causes an oops when trying to load the ipvs
>> module or a boot-time panic if ipvs is built-in. This same issue has
>> been reported by the Intel kernel test robot previously.
>>
>> Digging deeper into both LLVM and the kernel code reveals this to be a
>> undefined behavior problem. ip_vs_protocol_init() uses a on-stack buffer
>> of 64 chars to store the registered protocol names and leaves it
>> uninitialized after definition. The function calls strnlen() when
>> concatenating protocol names into the buffer. With CONFIG_FORTIFY_SOURCE
>> strnlen() performs an extra step to check whether the last byte of the
>> input char buffer is a null character (commit 3009f891bb9f ("fortify:
>> Allow strlen() and strnlen() to pass compile-time known lengths")).
>> This, together with possibly other configurations, cause the following
>> IR to be generated:
>>
>>   define hidden i32 @ip_vs_protocol_init() local_unnamed_addr #5 section 
>> ".init.text" align 16 !kcfi_type !29 {
>> %1 = alloca [64 x i8], align 16
>> ...
>>
>>   14:   ; preds = %11
>> %15 = getelementptr inbounds i8, ptr %1, i64 63
>> %16 = load i8, ptr %15, align 1
>> %17 = tail call i1 @llvm.is.constant.i8(i8 %16)
>> %18 = icmp eq i8 %16, 0
>> %19 = select i1 %17, i1 %18, i1 false
>> br i1 %19, label %20, label %23
>>
>>   20:   ; preds = %14
>> %21 = call i64 @strlen(ptr noundef nonnull dereferenceable(1) %1) #23
>> ...
>>
>>   23:   ; preds = %14, %11, %20
>> %24 = call i64 @strnlen(ptr noundef nonnull dereferenceable(1) %1, i64 
>> noundef 64) #24
>> ...
>>   }
>>
>> The above code calculates the address of the last char in the buffer
>> (value %15) and then loads from it (value %16). Because the buffer is
>> never initialized, the LLVM GVN pass marks value %16 as undefined:
>>
>>   %13 = getelementptr inbounds i8, ptr %1, i64 63
>>   br i1 undef, label %14, label %17
>>
>> This gives later passes (SCCP, in particular) to more DCE opportunities

One small request: if you could help us remove the extra "to" in the above
sentence when committing this patch, it would be great.

>> by propagating the undef value further, and eventually removes
>> everything after the load on the uninitialized stack location:
>>
>>   define hidden i32 @ip_vs_protocol_init() local_unnamed_addr #0 section 
>> ".init.text" align 16 !kcfi_type !11 {
>> %1 = alloca [64 x i8], align 16
>> ...
>>
>>   12:   ; preds = %11
>> %13 = getelementptr inbounds i8, ptr %1, i64 63
>> unreachable
>>   }
>>
>> In this way, the generated native code will just fall through to the
>> next function, as LLVM does not generate any code for the unreachable IR
>> instruction and leaves the function without a terminator.
>>
>> Zero the on-stack buffer to avoid this possible UB.
>>
>> Reported-by: kernel test robot 
>> Closes: 
>> https://urldefense.com/v3/__https://lore.kernel.org/oe-kbuild-all/202402100205.pwxiz1zk-...@intel.com/__;!!DZ3fjg!823fsY09q3IcP8uThu-yUuuQaiwQOR7gZJhV9JNWdxzerlkYJ4JkZGYuq4iO1DKqaErCulk1CGir$
>>  
>> Co-developed-by: Ruowen Qin 
>> Signed-off-by: Ruowen Qin 
>> Signed-off-by: Jinghao Jia 
> 
>   Looks good to me, thanks! I assume it is for
> net-next/nf-next, right?

I am actually not familiar with the netfilter trees. IMHO this should also be
back-ported to the stable kernels -- I wonder if net-next/nf-next is a good
tree for this?

> 
> Acked-by: Julian Anastasov 
> 
>> ---
>>  net/netfilter/ipvs/ip_vs_proto.c | 4 +---
>>  1 file changed, 1 insertion(+), 3 deletions(-)
>>
>> diff --git a/net/netfilter/ipvs/ip_vs_proto.c 
>> b/net/netfilter/ipvs/ip_vs_proto.c
>> index f100da4ba3bc..a9f

[PATCH v2 net] ipvs: fix UB due to uninitialized stack access in ip_vs_protocol_init()

2024-11-21 Thread Jinghao Jia
Under certain kernel configurations when building with Clang/LLVM, the
compiler does not generate a return or jump as the terminator
instruction for ip_vs_protocol_init(), triggering the following objtool
warning during build time:

  vmlinux.o: warning: objtool: ip_vs_protocol_init() falls through to next 
function __initstub__kmod_ip_vs_rr__935_123_ip_vs_rr_init6()

At runtime, this either causes an oops when trying to load the ipvs
module or a boot-time panic if ipvs is built-in. This same issue has
been reported by the Intel kernel test robot previously.

Digging deeper into both LLVM and the kernel code reveals this to be a
undefined behavior problem. ip_vs_protocol_init() uses a on-stack buffer
of 64 chars to store the registered protocol names and leaves it
uninitialized after definition. The function calls strnlen() when
concatenating protocol names into the buffer. With CONFIG_FORTIFY_SOURCE
strnlen() performs an extra step to check whether the last byte of the
input char buffer is a null character (commit 3009f891bb9f ("fortify:
Allow strlen() and strnlen() to pass compile-time known lengths")).
This, together with possibly other configurations, cause the following
IR to be generated:

  define hidden i32 @ip_vs_protocol_init() local_unnamed_addr #5 section 
".init.text" align 16 !kcfi_type !29 {
%1 = alloca [64 x i8], align 16
...

  14:   ; preds = %11
%15 = getelementptr inbounds i8, ptr %1, i64 63
%16 = load i8, ptr %15, align 1
%17 = tail call i1 @llvm.is.constant.i8(i8 %16)
%18 = icmp eq i8 %16, 0
%19 = select i1 %17, i1 %18, i1 false
br i1 %19, label %20, label %23

  20:   ; preds = %14
%21 = call i64 @strlen(ptr noundef nonnull dereferenceable(1) %1) #23
...

  23:   ; preds = %14, %11, %20
%24 = call i64 @strnlen(ptr noundef nonnull dereferenceable(1) %1, i64 
noundef 64) #24
...
  }

The above code calculates the address of the last char in the buffer
(value %15) and then loads from it (value %16). Because the buffer is
never initialized, the LLVM GVN pass marks value %16 as undefined:

  %13 = getelementptr inbounds i8, ptr %1, i64 63
  br i1 undef, label %14, label %17

This gives later passes (SCCP, in particular) more DCE opportunities by
propagating the undef value further, and eventually removes everything
after the load on the uninitialized stack location:

  define hidden i32 @ip_vs_protocol_init() local_unnamed_addr #0 section 
".init.text" align 16 !kcfi_type !11 {
%1 = alloca [64 x i8], align 16
...

  12:   ; preds = %11
%13 = getelementptr inbounds i8, ptr %1, i64 63
unreachable
  }

In this way, the generated native code will just fall through to the
next function, as LLVM does not generate any code for the unreachable IR
instruction and leaves the function without a terminator.

Zero the on-stack buffer to avoid this possible UB.

Changelog:
---
v1 -> v2:
v1: https://lore.kernel.org/lkml/2024065105.82431-1-jingh...@illinois.edu/
* Fix small error in commit message
* Address Julian's feedback:
  * Make this patch target the net tree rather than net-next
  * Add a "Fixes" tag for the initial git commit

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: kernel test robot 
Closes: 
https://lore.kernel.org/oe-kbuild-all/202402100205.pwxiz1zk-...@intel.com/
Co-developed-by: Ruowen Qin 
Signed-off-by: Ruowen Qin 
Signed-off-by: Jinghao Jia 
---
 net/netfilter/ipvs/ip_vs_proto.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/net/netfilter/ipvs/ip_vs_proto.c b/net/netfilter/ipvs/ip_vs_proto.c
index f100da4ba3bc..a9fd1d3fc2cb 100644
--- a/net/netfilter/ipvs/ip_vs_proto.c
+++ b/net/netfilter/ipvs/ip_vs_proto.c
@@ -340,7 +340,7 @@ void __net_exit ip_vs_protocol_net_cleanup(struct 
netns_ipvs *ipvs)
 
 int __init ip_vs_protocol_init(void)
 {
-   char protocols[64];
+   char protocols[64] = { 0 };
 #define REGISTER_PROTOCOL(p)   \
do {\
register_ip_vs_protocol(p); \
@@ -348,8 +348,6 @@ int __init ip_vs_protocol_init(void)
strcat(protocols, (p)->name);   \
} while (0)
 
-   protocols[0] = '\0';
-   protocols[2] = '\0';
 #ifdef CONFIG_IP_VS_PROTO_TCP
REGISTER_PROTOCOL(&ip_vs_protocol_tcp);
 #endif
-- 
2.47.0




Re: [PATCH v2 net] ipvs: fix UB due to uninitialized stack access in ip_vs_protocol_init()

2024-11-22 Thread Jinghao Jia
Hi Julian,

On 11/22/24 5:43 AM, Julian Anastasov wrote:
> 
>   Hello,
> 
> On Thu, 21 Nov 2024, Jinghao Jia wrote:
> 
>> Under certain kernel configurations when building with Clang/LLVM, the
>> compiler does not generate a return or jump as the terminator
>> instruction for ip_vs_protocol_init(), triggering the following objtool
>> warning during build time:
>>
>>   vmlinux.o: warning: objtool: ip_vs_protocol_init() falls through to next 
>> function __initstub__kmod_ip_vs_rr__935_123_ip_vs_rr_init6()
>>
>> At runtime, this either causes an oops when trying to load the ipvs
>> module or a boot-time panic if ipvs is built-in. This same issue has
>> been reported by the Intel kernel test robot previously.
>>
>> Digging deeper into both LLVM and the kernel code reveals this to be a
>> undefined behavior problem. ip_vs_protocol_init() uses a on-stack buffer
>> of 64 chars to store the registered protocol names and leaves it
>> uninitialized after definition. The function calls strnlen() when
>> concatenating protocol names into the buffer. With CONFIG_FORTIFY_SOURCE
>> strnlen() performs an extra step to check whether the last byte of the
>> input char buffer is a null character (commit 3009f891bb9f ("fortify:
>> Allow strlen() and strnlen() to pass compile-time known lengths")).
>> This, together with possibly other configurations, cause the following
>> IR to be generated:
>>
>>   define hidden i32 @ip_vs_protocol_init() local_unnamed_addr #5 section 
>> ".init.text" align 16 !kcfi_type !29 {
>> %1 = alloca [64 x i8], align 16
>> ...
>>
>>   14:   ; preds = %11
>> %15 = getelementptr inbounds i8, ptr %1, i64 63
>> %16 = load i8, ptr %15, align 1
>> %17 = tail call i1 @llvm.is.constant.i8(i8 %16)
>> %18 = icmp eq i8 %16, 0
>> %19 = select i1 %17, i1 %18, i1 false
>> br i1 %19, label %20, label %23
>>
>>   20:   ; preds = %14
>> %21 = call i64 @strlen(ptr noundef nonnull dereferenceable(1) %1) #23
>> ...
>>
>>   23:   ; preds = %14, %11, %20
>> %24 = call i64 @strnlen(ptr noundef nonnull dereferenceable(1) %1, i64 
>> noundef 64) #24
>> ...
>>   }
>>
>> The above code calculates the address of the last char in the buffer
>> (value %15) and then loads from it (value %16). Because the buffer is
>> never initialized, the LLVM GVN pass marks value %16 as undefined:
>>
>>   %13 = getelementptr inbounds i8, ptr %1, i64 63
>>   br i1 undef, label %14, label %17
>>
>> This gives later passes (SCCP, in particular) more DCE opportunities by
>> propagating the undef value further, and eventually removes everything
>> after the load on the uninitialized stack location:
>>
>>   define hidden i32 @ip_vs_protocol_init() local_unnamed_addr #0 section 
>> ".init.text" align 16 !kcfi_type !11 {
>> %1 = alloca [64 x i8], align 16
>> ...
>>
>>   12:   ; preds = %11
>> %13 = getelementptr inbounds i8, ptr %1, i64 63
>> unreachable
>>   }
>>
>> In this way, the generated native code will just fall through to the
>> next function, as LLVM does not generate any code for the unreachable IR
>> instruction and leaves the function without a terminator.
>>
>> Zero the on-stack buffer to avoid this possible UB.
>>
>> Changelog:
>> ---
> 
>   You can not add --- before the following headers.
> 'git am file.patch' can show the result of applying the patch.

Oops my bad -- have never done a patch with changelogs inside the commit
message.

> 
>> v1 -> v2:
>> v1: 
>> https://urldefense.com/v3/__https://lore.kernel.org/lkml/2024065105.82431-1-jingh...@illinois.edu/__;!!DZ3fjg!45RxVgl4P0t8X0Vx6JFjX8J0xtWPfXijFBJTXvQ6B89xyEQqWT53QYyG9Ztlo3lfyadT2FdfZexm$
>>  
>> * Fix small error in commit message
>> * Address Julian's feedback:
>>   * Make this patch target the net tree rather than net-next
>>   * Add a "Fixes" tag for the initial git commit
>>
>> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
>> Reported-by: kernel test robot 
>> Closes: 
>> https://urldefense.com/v3/__https://lore.kernel.org/oe-kbuild-all/202402100205.pwxiz1zk-...@intel.com/__;!!DZ3fjg!45RxVgl4P0t8X0Vx6JFjX8J0xtWPfXijFBJTXvQ6B89xyEQqWT53QYyG9Ztlo3lfyadT2Ko7xLwQ$
>>  
>> Co-developed-by: Ruowen Qi

[PATCH v3 net] ipvs: fix UB due to uninitialized stack access in ip_vs_protocol_init()

2024-11-23 Thread Jinghao Jia
Under certain kernel configurations when building with Clang/LLVM, the
compiler does not generate a return or jump as the terminator
instruction for ip_vs_protocol_init(), triggering the following objtool
warning during build time:

  vmlinux.o: warning: objtool: ip_vs_protocol_init() falls through to next 
function __initstub__kmod_ip_vs_rr__935_123_ip_vs_rr_init6()

At runtime, this either causes an oops when trying to load the ipvs
module or a boot-time panic if ipvs is built-in. This same issue has
been reported by the Intel kernel test robot previously.

Digging deeper into both LLVM and the kernel code reveals this to be a
undefined behavior problem. ip_vs_protocol_init() uses a on-stack buffer
of 64 chars to store the registered protocol names and leaves it
uninitialized after definition. The function calls strnlen() when
concatenating protocol names into the buffer. With CONFIG_FORTIFY_SOURCE
strnlen() performs an extra step to check whether the last byte of the
input char buffer is a null character (commit 3009f891bb9f ("fortify:
Allow strlen() and strnlen() to pass compile-time known lengths")).
This, together with possibly other configurations, cause the following
IR to be generated:

  define hidden i32 @ip_vs_protocol_init() local_unnamed_addr #5 section 
".init.text" align 16 !kcfi_type !29 {
%1 = alloca [64 x i8], align 16
...

  14:   ; preds = %11
%15 = getelementptr inbounds i8, ptr %1, i64 63
%16 = load i8, ptr %15, align 1
%17 = tail call i1 @llvm.is.constant.i8(i8 %16)
%18 = icmp eq i8 %16, 0
%19 = select i1 %17, i1 %18, i1 false
br i1 %19, label %20, label %23

  20:   ; preds = %14
%21 = call i64 @strlen(ptr noundef nonnull dereferenceable(1) %1) #23
...

  23:   ; preds = %14, %11, %20
%24 = call i64 @strnlen(ptr noundef nonnull dereferenceable(1) %1, i64 
noundef 64) #24
...
  }

The above code calculates the address of the last char in the buffer
(value %15) and then loads from it (value %16). Because the buffer is
never initialized, the LLVM GVN pass marks value %16 as undefined:

  %13 = getelementptr inbounds i8, ptr %1, i64 63
  br i1 undef, label %14, label %17

This gives later passes (SCCP, in particular) more DCE opportunities by
propagating the undef value further, and eventually removes everything
after the load on the uninitialized stack location:

  define hidden i32 @ip_vs_protocol_init() local_unnamed_addr #0 section 
".init.text" align 16 !kcfi_type !11 {
%1 = alloca [64 x i8], align 16
...

  12:   ; preds = %11
%13 = getelementptr inbounds i8, ptr %1, i64 63
unreachable
  }

In this way, the generated native code will just fall through to the
next function, as LLVM does not generate any code for the unreachable IR
instruction and leaves the function without a terminator.

Zero the on-stack buffer to avoid this possible UB.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: kernel test robot 
Closes: 
https://lore.kernel.org/oe-kbuild-all/202402100205.pwxiz1zk-...@intel.com/
Co-developed-by: Ruowen Qin 
Signed-off-by: Ruowen Qin 
Signed-off-by: Jinghao Jia 
---
Changelog:
v2 -> v3:
v2: https://lore.kernel.org/lkml/20241122045257.27452-1-jingh...@illinois.edu/
* Fix changelog format based on Julian's feedback

v1 -> v2:
v1: https://lore.kernel.org/lkml/2024065105.82431-1-jingh...@illinois.edu/
* Fix small error in commit message
* Address Julian's feedback:
  * Make this patch target the net tree rather than net-next
  * Add a "Fixes" tag for the initial git commit

 net/netfilter/ipvs/ip_vs_proto.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/net/netfilter/ipvs/ip_vs_proto.c b/net/netfilter/ipvs/ip_vs_proto.c
index f100da4ba3bc..a9fd1d3fc2cb 100644
--- a/net/netfilter/ipvs/ip_vs_proto.c
+++ b/net/netfilter/ipvs/ip_vs_proto.c
@@ -340,7 +340,7 @@ void __net_exit ip_vs_protocol_net_cleanup(struct 
netns_ipvs *ipvs)
 
 int __init ip_vs_protocol_init(void)
 {
-   char protocols[64];
+   char protocols[64] = { 0 };
 #define REGISTER_PROTOCOL(p)   \
do {\
register_ip_vs_protocol(p); \
@@ -348,8 +348,6 @@ int __init ip_vs_protocol_init(void)
strcat(protocols, (p)->name);   \
} while (0)
 
-   protocols[0] = '\0';
-   protocols[2] = '\0';
 #ifdef CONFIG_IP_VS_PROTO_TCP
REGISTER_PROTOCOL(&ip_vs_protocol_tcp);
 #endif
-- 
2.47.0