On Thu, Apr 30, 2026 at 03:03PM +0200, Vlastimil Babka (SUSE) wrote:
> On 4/24/26 15:24, Marco Elver wrote:
>
> > @@ -938,7 +968,7 @@ void *__kmalloc_large_node_noprof(size_t size, gfp_t
> > flags, int node)
> > * Try really hard to succeed the allocation but fail
> > * eventually.
> > */
> > -static __always_inline __alloc_size(1) void *kmalloc_noprof(size_t size,
> > gfp_t flags)
> > +static __always_inline __alloc_size(1) void *_kmalloc_noprof(size_t size,
> > gfp_t flags, kmalloc_token_t token)
> > {
> > if (__builtin_constant_p(size) && size) {
> > unsigned int index;
> > @@ -948,14 +978,16 @@ static __always_inline __alloc_size(1) void
> > *kmalloc_noprof(size_t size, gfp_t f
> >
> > index = kmalloc_index(size);
> > return __kmalloc_cache_noprof(
> > - kmalloc_caches[kmalloc_type(flags,
> > _RET_IP_)][index],
> > + kmalloc_caches[kmalloc_type(flags,
> > token)][index],
>
> While reviewing this, it occured to me we might have been using _RET_IP_
> here in a suboptimal way ever since this was introduced. Since this is all
> inlined, shouldn't have we been using _THIS_IP_ to really randomize using
> the kmalloc() callsite, and not its parent?
>
> And after this patch, we get the token passed to _kmalloc_noprof()...
>
> > flags, size);
> > }
> > - return __kmalloc_noprof(size, flags);
> > + return __kmalloc_noprof(PASS_KMALLOC_PARAMS(size, NULL, token), flags);
>
> ... and used also here for the non-constant-size, where previously
> __kmalloc_noprof() (not inline function) would correctly use _RET_IP_ on its
> own ...
>
> > }
> > +#define kmalloc_noprof(...)
> > _kmalloc_noprof(__VA_ARGS__, __kmalloc_token(__VA_ARGS__))
>
> ... and the token comes from here. With random partitioning that's
> #define __kmalloc_token(...) ((kmalloc_token_t){ .v = _RET_IP_ })
>
> so that AFAIK makes the situation worse as now the cases without constant
> size also start randomizing by the parent callsite and not the kmalloc
> callsite.
>
> But there are many users of __kmalloc_token() and maybe some are corrent in
> using _RET_IP_, I haven't checked, maybe we'll need two variants, or further
> change things around.
Good catch. I don't think we need multiple variants (otherwise the TYPED
variant would be broken) - we're moving token generation to the callers
(not even inlined anymore) with all this macro magic.
I think this is all we need:
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -503,7 +503,7 @@ int kmem_cache_shrink(struct kmem_cache *s);
typedef struct { unsigned long v; } kmalloc_token_t;
#ifdef CONFIG_KMALLOC_PARTITION_RANDOM
extern unsigned long random_kmalloc_seed;
-#define __kmalloc_token(...) ((kmalloc_token_t){ .v = _RET_IP_ })
+#define __kmalloc_token(...) ((kmalloc_token_t){ .v = _THIS_IP_ })
#elif defined(CONFIG_KMALLOC_PARTITION_TYPED)
#define __kmalloc_token(...) ((kmalloc_token_t){ .v =
__builtin_infer_alloc_token(__VA_ARGS__) })
#endif
Plus a paragraph in the commit message. Let me add that.