Re: [PATCH v2 0/9] slab: Introduce dedicated bucket allocator

2024-03-05 Thread GONG, Ruiqi



On 2024/03/05 18:10, Kees Cook wrote:
> Hi,
> 
> Repeating the commit logs for patch 4 here:
> 
> Dedicated caches are available For fixed size allocations via
> kmem_cache_alloc(), but for dynamically sized allocations there is only
> the global kmalloc API's set of buckets available. This means it isn't
> possible to separate specific sets of dynamically sized allocations into
> a separate collection of caches.
> 
> This leads to a use-after-free exploitation weakness in the Linux
> kernel since many heap memory spraying/grooming attacks depend on using
> userspace-controllable dynamically sized allocations to collide with
> fixed size allocations that end up in same cache.
> 
> While CONFIG_RANDOM_KMALLOC_CACHES provides a probabilistic defense
> against these kinds of "type confusion" attacks, including for fixed
> same-size heap objects, we can create a complementary deterministic
> defense for dynamically sized allocations.
> 
> In order to isolate user-controllable sized allocations from system
> allocations, introduce kmem_buckets_create(), which behaves like
> kmem_cache_create(). (The next patch will introduce kmem_buckets_alloc(),
> which behaves like kmem_cache_alloc().)

So can I say the vision here would be to make all the kernel interfaces
that handles user space input to use separated caches? Which looks like
creating a "grey zone" in the middle of kernel space (trusted) and user
space (untrusted) memory. I've also thought that maybe hardening on the
"border" could be more efficient and targeted than a mitigation that
affects globally, e.g. CONFIG_RANDOM_KMALLOC_CACHES.




Re: [PATCH v2 0/9] slab: Introduce dedicated bucket allocator

2024-03-15 Thread GONG, Ruiqi



On 2024/03/08 4:31, Kees Cook wrote:
> On Wed, Mar 06, 2024 at 09:47:36AM +0800, GONG, Ruiqi wrote:
>>
>>
>> On 2024/03/05 18:10, Kees Cook wrote:
>>> Hi,
>>>
>>> Repeating the commit logs for patch 4 here:
>>>
>>> Dedicated caches are available For fixed size allocations via
>>> kmem_cache_alloc(), but for dynamically sized allocations there is only
>>> the global kmalloc API's set of buckets available. This means it isn't
>>> possible to separate specific sets of dynamically sized allocations into
>>> a separate collection of caches.
>>>
>>> This leads to a use-after-free exploitation weakness in the Linux
>>> kernel since many heap memory spraying/grooming attacks depend on using
>>> userspace-controllable dynamically sized allocations to collide with
>>> fixed size allocations that end up in same cache.
>>>
>>> While CONFIG_RANDOM_KMALLOC_CACHES provides a probabilistic defense
>>> against these kinds of "type confusion" attacks, including for fixed
>>> same-size heap objects, we can create a complementary deterministic
>>> defense for dynamically sized allocations.
>>>
>>> In order to isolate user-controllable sized allocations from system
>>> allocations, introduce kmem_buckets_create(), which behaves like
>>> kmem_cache_create(). (The next patch will introduce 
>>> kmem_buckets_alloc(),
>>> which behaves like kmem_cache_alloc().)
>>
>> So can I say the vision here would be to make all the kernel interfaces
>> that handles user space input to use separated caches? Which looks like
>> creating a "grey zone" in the middle of kernel space (trusted) and user
>> space (untrusted) memory. I've also thought that maybe hardening on the
>> "border" could be more efficient and targeted than a mitigation that
>> affects globally, e.g. CONFIG_RANDOM_KMALLOC_CACHES.
> 
> I think it ends up having a similar effect, yes. The more copies that
> move to memdup_user(), the more coverage is created. The main point is to
> just not share caches between different kinds of allocations. The most
> abused version of this is the userspace size-controllable allocations,
> which this targets. 

I agree. Currently if we want to fulfill a more strict separation
between user-space manageable memory and other memory in kernel space,
technically speaking for fixed size allocations we could transform them
into using dedicated caches (i.e. kmem_cache_create()), but for dynamic
size allocations I don't think of any solution. With the APIs provided
by this patch set, we've got something that works.


> ... The existing caches (which could still be used for
> type confusion attacks when the sizes are sufficiently similar) have a
> good chance of being mitigated by CONFIG_RANDOM_KMALLOC_CACHES already,
> so this proposed change is just complementary, IMO.

Maybe in the future we could require that all user-kernel interfaces
that make use of SLAB caches should use either kmem_cache_create() or
kmem_buckets_create()? ;)

> 
> -Kees
> 




[PATCH v2 1/2] slab: Adjust placement of __kvmalloc_node_noprof

2025-02-07 Thread GONG Ruiqi
Move __kvmalloc_node_noprof (and also kvfree* for consistency) into
mm/slub.c so that it can directly invoke __do_kmalloc_node, which is
needed for the next patch. Move kmalloc_gfp_adjust to slab.h since now
its two callers are in different .c files.

No functional changes intended.

Signed-off-by: GONG Ruiqi 
---
 include/linux/slab.h |  22 +
 mm/slub.c|  90 ++
 mm/util.c| 112 ---
 3 files changed, 112 insertions(+), 112 deletions(-)

diff --git a/include/linux/slab.h b/include/linux/slab.h
index 09eedaecf120..0bf4cbf306fe 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -1101,4 +1101,26 @@ size_t kmalloc_size_roundup(size_t size);
 void __init kmem_cache_init_late(void);
 void __init kvfree_rcu_init(void);
 
+static inline gfp_t kmalloc_gfp_adjust(gfp_t flags, size_t size)
+{
+   /*
+* We want to attempt a large physically contiguous block first because
+* it is less likely to fragment multiple larger blocks and therefore
+* contribute to a long term fragmentation less than vmalloc fallback.
+* However make sure that larger requests are not too disruptive - no
+* OOM killer and no allocation failure warnings as we have a fallback.
+*/
+   if (size > PAGE_SIZE) {
+   flags |= __GFP_NOWARN;
+
+   if (!(flags & __GFP_RETRY_MAYFAIL))
+   flags |= __GFP_NORETRY;
+
+   /* nofail semantic is implemented by the vmalloc fallback */
+   flags &= ~__GFP_NOFAIL;
+   }
+
+   return flags;
+}
+
 #endif /* _LINUX_SLAB_H */
diff --git a/mm/slub.c b/mm/slub.c
index 1f50129dcfb3..0830894bb92c 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -4878,6 +4878,96 @@ void *krealloc_noprof(const void *p, size_t new_size, 
gfp_t flags)
 }
 EXPORT_SYMBOL(krealloc_noprof);
 
+/**
+ * __kvmalloc_node - attempt to allocate physically contiguous memory, but upon
+ * failure, fall back to non-contiguous (vmalloc) allocation.
+ * @size: size of the request.
+ * @b: which set of kmalloc buckets to allocate from.
+ * @flags: gfp mask for the allocation - must be compatible (superset) with 
GFP_KERNEL.
+ * @node: numa node to allocate from
+ *
+ * Uses kmalloc to get the memory but if the allocation fails then falls back
+ * to the vmalloc allocator. Use kvfree for freeing the memory.
+ *
+ * GFP_NOWAIT and GFP_ATOMIC are not supported, neither is the __GFP_NORETRY 
modifier.
+ * __GFP_RETRY_MAYFAIL is supported, and it should be used only if kmalloc is
+ * preferable to the vmalloc fallback, due to visible performance drawbacks.
+ *
+ * Return: pointer to the allocated memory of %NULL in case of failure
+ */
+void *__kvmalloc_node_noprof(DECL_BUCKET_PARAMS(size, b), gfp_t flags, int 
node)
+{
+   void *ret;
+
+   /*
+* It doesn't really make sense to fallback to vmalloc for sub page
+* requests
+*/
+   ret = __kmalloc_node_noprof(PASS_BUCKET_PARAMS(size, b),
+   kmalloc_gfp_adjust(flags, size),
+   node);
+   if (ret || size <= PAGE_SIZE)
+   return ret;
+
+   /* non-sleeping allocations are not supported by vmalloc */
+   if (!gfpflags_allow_blocking(flags))
+   return NULL;
+
+   /* Don't even allow crazy sizes */
+   if (unlikely(size > INT_MAX)) {
+   WARN_ON_ONCE(!(flags & __GFP_NOWARN));
+   return NULL;
+   }
+
+   /*
+* kvmalloc() can always use VM_ALLOW_HUGE_VMAP,
+* since the callers already cannot assume anything
+* about the resulting pointer, and cannot play
+* protection games.
+*/
+   return __vmalloc_node_range_noprof(size, 1, VMALLOC_START, VMALLOC_END,
+   flags, PAGE_KERNEL, VM_ALLOW_HUGE_VMAP,
+   node, __builtin_return_address(0));
+}
+EXPORT_SYMBOL(__kvmalloc_node_noprof);
+
+/**
+ * kvfree() - Free memory.
+ * @addr: Pointer to allocated memory.
+ *
+ * kvfree frees memory allocated by any of vmalloc(), kmalloc() or kvmalloc().
+ * It is slightly more efficient to use kfree() or vfree() if you are certain
+ * that you know which one to use.
+ *
+ * Context: Either preemptible task context or not-NMI interrupt.
+ */
+void kvfree(const void *addr)
+{
+   if (is_vmalloc_addr(addr))
+   vfree(addr);
+   else
+   kfree(addr);
+}
+EXPORT_SYMBOL(kvfree);
+
+/**
+ * kvfree_sensitive - Free a data object containing sensitive information.
+ * @addr: address of the data object to be freed.
+ * @len: length of the data object.
+ *
+ * Use the special memzero_explicit() function to clear the content of a
+ * kvmalloc'ed object containing sensitive data to make sure that the
+ * compiler won't optimize out the data clearing.
+ */
+void kvfree_sensitive(const void

[PATCH v2 2/2] slab: Achieve better kmalloc caches randomization in kvmalloc

2025-02-07 Thread GONG Ruiqi
As revealed by this writeup[1], due to the fact that __kmalloc_node (now
renamed to __kmalloc_node_noprof) is an exported symbol and will never
get inlined, using it in kvmalloc_node (now is __kvmalloc_node_noprof)
would make the RET_IP inside always point to the same address:

upper_caller
kvmalloc
kvmalloc_node
kvmalloc_node_noprof
__kvmalloc_node_noprof  <-- all macros all the way down here
__kmalloc_node_noprof
__do_kmalloc_node(.., _RET_IP_)
... <-- _RET_IP_ points to

That literally means all kmalloc invoked via kvmalloc would use the same
seed for cache randomization (CONFIG_RANDOM_KMALLOC_CACHES), which makes
this hardening unfunctional.

The root cause of this problem, IMHO, is that using RET_IP only cannot
identify the actual allocation site in case of kmalloc being called
inside wrappers or helper functions. And I believe there could be
similar cases in other functions. Nevertheless, I haven't thought of
any good solution for this. So for now let's solve this specific case
first.

For __kvmalloc_node_noprof, replace __kmalloc_node_noprof and call
__do_kmalloc_node directly instead, so that RET_IP can take the return
address of kvmalloc and differentiate each kvmalloc invocation:

upper_caller
kvmalloc
kvmalloc_node
kvmalloc_node_noprof
__kvmalloc_node_noprof  <-- all macros all the way down here
__do_kmalloc_node(.., _RET_IP_)
... <-- _RET_IP_ points to

Thanks to Tamás Koczka for the report and discussion!

Link: 
https://github.com/google/security-research/pull/83/files#diff-1604319b55a48c39a210ee52034ed7ff5b9cdc3d704d2d9e34eb230d19fae235R200
 [1]
Reported-by: Tamás Koczka 
Signed-off-by: GONG Ruiqi 
---
 mm/slub.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 0830894bb92c..46e884b77dca 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -4903,9 +4903,9 @@ void *__kvmalloc_node_noprof(DECL_BUCKET_PARAMS(size, b), 
gfp_t flags, int node)
 * It doesn't really make sense to fallback to vmalloc for sub page
 * requests
 */
-   ret = __kmalloc_node_noprof(PASS_BUCKET_PARAMS(size, b),
-   kmalloc_gfp_adjust(flags, size),
-   node);
+   ret = __do_kmalloc_node(size, PASS_BUCKET_PARAM(b),
+   kmalloc_gfp_adjust(flags, size),
+   node, _RET_IP_);
if (ret || size <= PAGE_SIZE)
return ret;
 
-- 
2.25.1




[PATCH v2 0/2] Refine kmalloc caches randomization in kvmalloc

2025-02-07 Thread GONG Ruiqi
Hi,

v2: change the implementation as Vlastimil suggested
v1: https://lore.kernel.org/all/20250122074817.991060-1-gongrui...@huawei.com/

Tamás reported [1] that kmalloc cache randomization doesn't actually
work for those kmalloc invoked via kvmalloc. For more details, see the
commit log of patch 2.

The current solution requires a direct call from __kvmalloc_node_noprof
to __do_kmalloc_node, a static function in a different .c file.
Comparing to v1, this version achieves this by simply moving
__kvmalloc_node_noprof to mm/slub.c, as suggested by Vlastimil [2].

Link: 
https://github.com/google/security-research/pull/83/files#diff-1604319b55a48c39a210ee52034ed7ff5b9cdc3d704d2d9e34eb230d19fae235R200
 [1]
Link: https://lore.kernel.org/all/62044279-0c56-4185-97f7-7afac65ff...@suse.cz/ 
[2]

GONG Ruiqi (2):
  slab: Adjust placement of __kvmalloc_node_noprof
  slab: Achieve better kmalloc caches randomization in kvmalloc

 include/linux/slab.h |  22 +
 mm/slub.c|  90 ++
 mm/util.c| 112 ---
 3 files changed, 112 insertions(+), 112 deletions(-)

-- 
2.25.1




[PATCH] mm/slab: Achieve better kmalloc caches randomization in kvmalloc

2025-01-21 Thread GONG Ruiqi
As revealed by this writeup[1], due to the fact that __kmalloc_node (now
renamed to __kmalloc_node_noprof) is an exported symbol and will never
get inlined, using it in kvmalloc_node (now is __kvmalloc_node_noprof)
would make the RET_IP inside always point to the same address:

upper_caller
kvmalloc
kvmalloc_node
kvmalloc_node_noprof
__kvmalloc_node_noprof  <-- all macros all the way down here
__kmalloc_node_noprof
__do_kmalloc_node(.., _RET_IP_)
... <-- _RET_IP_ points to

That literally means all kmalloc invoked via kvmalloc would use the same
seed for cache randomization (CONFIG_RANDOM_KMALLOC_CACHES), which makes
this hardening unfunctional.

The root cause of this problem, IMHO, is that using RET_IP only cannot
identify the actual allocation site in case of kmalloc being called
inside wrappers or helper functions. And I believe there could be
similar cases in other functions. Nevertheless, I haven't thought of
any good solution for this. So for now let's solve this specific case
first.

For __kvmalloc_node_noprof, replace __kmalloc_node_noprof with an inline
version, so that RET_IP can take the return address of kvmalloc and
differentiate each kvmalloc invocation:

upper_caller
kvmalloc
kvmalloc_node
kvmalloc_node_noprof
__kvmalloc_node_noprof  <-- all macros all the way down here
__kmalloc_node_inline(.., _RET_IP_)
... <-- _RET_IP_ points to

Thanks to Tamás Koczka for the report and discussion!

Links:
[1] 
https://github.com/google/security-research/pull/83/files#diff-1604319b55a48c39a210ee52034ed7ff5b9cdc3d704d2d9e34eb230d19fae235R200

Signed-off-by: GONG Ruiqi 
---
 include/linux/slab.h | 3 +++
 mm/slub.c| 7 +++
 mm/util.c| 4 ++--
 3 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/include/linux/slab.h b/include/linux/slab.h
index 10a971c2bde3..e03ca4a95511 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -834,6 +834,9 @@ void *__kmalloc_large_noprof(size_t size, gfp_t flags)
 void *__kmalloc_large_node_noprof(size_t size, gfp_t flags, int node)
__assume_page_alignment __alloc_size(1);
 
+void *__kmalloc_node_inline(size_t size, kmem_buckets *b, gfp_t flags,
+   int node, unsigned long caller);
+
 /**
  * kmalloc - allocate kernel memory
  * @size: how many bytes of memory are required.
diff --git a/mm/slub.c b/mm/slub.c
index c2151c9fee22..ec75070345c6 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -4319,6 +4319,13 @@ void 
*__kmalloc_node_track_caller_noprof(DECL_BUCKET_PARAMS(size, b), gfp_t flag
 }
 EXPORT_SYMBOL(__kmalloc_node_track_caller_noprof);
 
+__always_inline void *__kmalloc_node_inline(size_t size, kmem_buckets *b,
+   gfp_t flags, int node,
+   unsigned long caller)
+{
+   return __do_kmalloc_node(size, b, flags, node, caller);
+}
+
 void *__kmalloc_cache_noprof(struct kmem_cache *s, gfp_t gfpflags, size_t size)
 {
void *ret = slab_alloc_node(s, NULL, gfpflags, NUMA_NO_NODE,
diff --git a/mm/util.c b/mm/util.c
index 60aa40f612b8..3910d1d1f595 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -642,9 +642,9 @@ void *__kvmalloc_node_noprof(DECL_BUCKET_PARAMS(size, b), 
gfp_t flags, int node)
 * It doesn't really make sense to fallback to vmalloc for sub page
 * requests
 */
-   ret = __kmalloc_node_noprof(PASS_BUCKET_PARAMS(size, b),
+   ret = __kmalloc_node_inline(size, PASS_BUCKET_PARAM(b),
kmalloc_gfp_adjust(flags, size),
-   node);
+   node, _RET_IP_);
if (ret || size <= PAGE_SIZE)
return ret;
 
-- 
2.25.1




Re: [PATCH] mm/slab: Achieve better kmalloc caches randomization in kvmalloc

2025-01-25 Thread GONG Ruiqi



On 2025/01/24 23:19, Vlastimil Babka wrote:
> On 1/22/25 17:02, Christoph Lameter (Ampere) wrote:
>> On Wed, 22 Jan 2025, GONG Ruiqi wrote:
>>
>>>
>>> +void *__kmalloc_node_inline(size_t size, kmem_buckets *b, gfp_t flags,
>>> +   int node, unsigned long caller);
>>> +
>>
>>
>> Huh? Is this inline? Where is the body of the function?
>>
>>> diff --git a/mm/slub.c b/mm/slub.c
>>> index c2151c9fee22..ec75070345c6 100644
>>> --- a/mm/slub.c
>>> +++ b/mm/slub.c
>>> @@ -4319,6 +4319,13 @@ void 
>>> *__kmalloc_node_track_caller_noprof(DECL_BUCKET_PARAMS(size, b), gfp_t flag
>>>  }
>>>  EXPORT_SYMBOL(__kmalloc_node_track_caller_noprof);
>>>
>>> +__always_inline void *__kmalloc_node_inline(size_t size, kmem_buckets *b,
>>> +   gfp_t flags, int node,
>>> +   unsigned long caller)
>>> +{
>>> +   return __do_kmalloc_node(size, b, flags, node, caller);
>>> +}
>>> +
>>
>> inline functions need to be defined in the header file AFAICT.
> 
> Yeah, this could possibly inline only with LTO (dunno if it does). But the
> real difference is passing __kvmalloc_node_noprof()'s _RET_IP_ as caller.
> 
> Maybe instead of this new wrapper we could just move
> __kvmalloc_node_noprof() to slub.c and access __do_kmalloc_node() directly.
> For consistency also kvfree() and whatever necessary dependencies. The
> placement in util.c is kinda weird anyway and IIRC we already moved
> krealloc() due to needing deeper involvement with slab internals. The
> vmalloc part of kvmalloc/kvfree is kinda a self-contained fallback that can
> be just called from slub.c as well as from util.c.

Thanks for the advice!

I will send a V2 based on moving __kvmalloc_node_noprof() and kvfree()
to slub.c as soon as possible.

BR,
Ruiqi



[PATCH v3 2/2] slab: Achieve better kmalloc caches randomization in kvmalloc

2025-02-12 Thread GONG Ruiqi
As revealed by this writeup[1], due to the fact that __kmalloc_node (now
renamed to __kmalloc_node_noprof) is an exported symbol and will never
get inlined, using it in kvmalloc_node (now is __kvmalloc_node_noprof)
would make the RET_IP inside always point to the same address:

upper_caller
kvmalloc
kvmalloc_node
kvmalloc_node_noprof
__kvmalloc_node_noprof  <-- all macros all the way down here
__kmalloc_node_noprof
__do_kmalloc_node(.., _RET_IP_)
... <-- _RET_IP_ points to

That literally means all kmalloc invoked via kvmalloc would use the same
seed for cache randomization (CONFIG_RANDOM_KMALLOC_CACHES), which makes
this hardening non-functional.

The root cause of this problem, IMHO, is that using RET_IP only cannot
identify the actual allocation site in case of kmalloc being called
inside non-inlined wrappers or helper functions. And I believe there
could be similar cases in other functions. Nevertheless, I haven't
thought of any good solution for this. So for now let's solve this
specific case first.

For __kvmalloc_node_noprof, replace __kmalloc_node_noprof and call
__do_kmalloc_node directly instead, so that RET_IP can take the return
address of kvmalloc and differentiate each kvmalloc invocation:

upper_caller
kvmalloc
kvmalloc_node
kvmalloc_node_noprof
__kvmalloc_node_noprof  <-- all macros all the way down here
__do_kmalloc_node(.., _RET_IP_)
... <-- _RET_IP_ points to

Thanks to Tamás Koczka for the report and discussion!

Link: 
https://github.com/google/security-research/blob/908d59b573960dc0b90adda6f16f7017aca08609/pocs/linux/kernelctf/CVE-2024-27397_mitigation/docs/exploit.md?plain=1#L259
 [1]
Reported-by: Tamás Koczka 
Signed-off-by: GONG Ruiqi 
---
 mm/slub.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index abc982d68feb..1f7d1d260eeb 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -4925,9 +4925,9 @@ void *__kvmalloc_node_noprof(DECL_BUCKET_PARAMS(size, b), 
gfp_t flags, int node)
 * It doesn't really make sense to fallback to vmalloc for sub page
 * requests
 */
-   ret = __kmalloc_node_noprof(PASS_BUCKET_PARAMS(size, b),
-   kmalloc_gfp_adjust(flags, size),
-   node);
+   ret = __do_kmalloc_node(size, PASS_BUCKET_PARAM(b),
+   kmalloc_gfp_adjust(flags, size),
+   node, _RET_IP_);
if (ret || size <= PAGE_SIZE)
return ret;
 
-- 
2.25.1




[PATCH v3 1/2] slab: Adjust placement of __kvmalloc_node_noprof

2025-02-12 Thread GONG Ruiqi
Move __kvmalloc_node_noprof (as well as kvfree*, kvrealloc_noprof and
kmalloc_gfp_adjust for consistency) into mm/slub.c so that it can
directly invoke __do_kmalloc_node, which is needed for the next patch.

No functional changes intended.

Signed-off-by: GONG Ruiqi 
---
 mm/slub.c | 162 ++
 mm/util.c | 162 --
 2 files changed, 162 insertions(+), 162 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 1f50129dcfb3..abc982d68feb 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -4878,6 +4878,168 @@ void *krealloc_noprof(const void *p, size_t new_size, 
gfp_t flags)
 }
 EXPORT_SYMBOL(krealloc_noprof);
 
+static gfp_t kmalloc_gfp_adjust(gfp_t flags, size_t size)
+{
+   /*
+* We want to attempt a large physically contiguous block first because
+* it is less likely to fragment multiple larger blocks and therefore
+* contribute to a long term fragmentation less than vmalloc fallback.
+* However make sure that larger requests are not too disruptive - no
+* OOM killer and no allocation failure warnings as we have a fallback.
+*/
+   if (size > PAGE_SIZE) {
+   flags |= __GFP_NOWARN;
+
+   if (!(flags & __GFP_RETRY_MAYFAIL))
+   flags |= __GFP_NORETRY;
+
+   /* nofail semantic is implemented by the vmalloc fallback */
+   flags &= ~__GFP_NOFAIL;
+   }
+
+   return flags;
+}
+
+/**
+ * __kvmalloc_node - attempt to allocate physically contiguous memory, but upon
+ * failure, fall back to non-contiguous (vmalloc) allocation.
+ * @size: size of the request.
+ * @b: which set of kmalloc buckets to allocate from.
+ * @flags: gfp mask for the allocation - must be compatible (superset) with 
GFP_KERNEL.
+ * @node: numa node to allocate from
+ *
+ * Uses kmalloc to get the memory but if the allocation fails then falls back
+ * to the vmalloc allocator. Use kvfree for freeing the memory.
+ *
+ * GFP_NOWAIT and GFP_ATOMIC are not supported, neither is the __GFP_NORETRY 
modifier.
+ * __GFP_RETRY_MAYFAIL is supported, and it should be used only if kmalloc is
+ * preferable to the vmalloc fallback, due to visible performance drawbacks.
+ *
+ * Return: pointer to the allocated memory of %NULL in case of failure
+ */
+void *__kvmalloc_node_noprof(DECL_BUCKET_PARAMS(size, b), gfp_t flags, int 
node)
+{
+   void *ret;
+
+   /*
+* It doesn't really make sense to fallback to vmalloc for sub page
+* requests
+*/
+   ret = __kmalloc_node_noprof(PASS_BUCKET_PARAMS(size, b),
+   kmalloc_gfp_adjust(flags, size),
+   node);
+   if (ret || size <= PAGE_SIZE)
+   return ret;
+
+   /* non-sleeping allocations are not supported by vmalloc */
+   if (!gfpflags_allow_blocking(flags))
+   return NULL;
+
+   /* Don't even allow crazy sizes */
+   if (unlikely(size > INT_MAX)) {
+   WARN_ON_ONCE(!(flags & __GFP_NOWARN));
+   return NULL;
+   }
+
+   /*
+* kvmalloc() can always use VM_ALLOW_HUGE_VMAP,
+* since the callers already cannot assume anything
+* about the resulting pointer, and cannot play
+* protection games.
+*/
+   return __vmalloc_node_range_noprof(size, 1, VMALLOC_START, VMALLOC_END,
+   flags, PAGE_KERNEL, VM_ALLOW_HUGE_VMAP,
+   node, __builtin_return_address(0));
+}
+EXPORT_SYMBOL(__kvmalloc_node_noprof);
+
+/**
+ * kvfree() - Free memory.
+ * @addr: Pointer to allocated memory.
+ *
+ * kvfree frees memory allocated by any of vmalloc(), kmalloc() or kvmalloc().
+ * It is slightly more efficient to use kfree() or vfree() if you are certain
+ * that you know which one to use.
+ *
+ * Context: Either preemptible task context or not-NMI interrupt.
+ */
+void kvfree(const void *addr)
+{
+   if (is_vmalloc_addr(addr))
+   vfree(addr);
+   else
+   kfree(addr);
+}
+EXPORT_SYMBOL(kvfree);
+
+/**
+ * kvfree_sensitive - Free a data object containing sensitive information.
+ * @addr: address of the data object to be freed.
+ * @len: length of the data object.
+ *
+ * Use the special memzero_explicit() function to clear the content of a
+ * kvmalloc'ed object containing sensitive data to make sure that the
+ * compiler won't optimize out the data clearing.
+ */
+void kvfree_sensitive(const void *addr, size_t len)
+{
+   if (likely(!ZERO_OR_NULL_PTR(addr))) {
+   memzero_explicit((void *)addr, len);
+   kvfree(addr);
+   }
+}
+EXPORT_SYMBOL(kvfree_sensitive);
+
+/**
+ * kvrealloc - reallocate memory; contents remain unchanged
+ * @p: object to reallocate memory for
+ * @size: the size to reallocate
+ * @flags: the flags for the page level allocator
+ *
+ * If

[PATCH v3 0/2] Refine kmalloc caches randomization in kvmalloc

2025-02-12 Thread GONG Ruiqi
Hi,

v3:
  - move all the way from kmalloc_gfp_adjust to kvrealloc_noprof into
mm/slub.c
  - some rewording for commit logs
v2: https://lore.kernel.org/all/20250208014723.1514049-1-gongrui...@huawei.com/
  - change the implementation as Vlastimil suggested
v1: https://lore.kernel.org/all/20250122074817.991060-1-gongrui...@huawei.com/

Tamás reported [1] that kmalloc cache randomization doesn't actually
work for those kmalloc invoked via kvmalloc. For more details, see the
commit log of patch 2.

The current solution requires a direct call from __kvmalloc_node_noprof
to __do_kmalloc_node, a static function in a different .c file. As
suggested by Vlastimil [2], it's achieved by simply moving
__kvmalloc_node_noprof from mm/util.c to mm/slub.c, together with some
other functions of the same family.

Link: 
https://github.com/google/security-research/blob/908d59b573960dc0b90adda6f16f7017aca08609/pocs/linux/kernelctf/CVE-2024-27397_mitigation/docs/exploit.md?plain=1#L259
 [1]
Link: https://lore.kernel.org/all/62044279-0c56-4185-97f7-7afac65ff...@suse.cz/ 
[2]

GONG Ruiqi (2):
  slab: Adjust placement of __kvmalloc_node_noprof
  slab: Achieve better kmalloc caches randomization in kvmalloc

 mm/slub.c | 162 ++
 mm/util.c | 162 --
 2 files changed, 162 insertions(+), 162 deletions(-)

-- 
2.25.1