On Wed, May 06, 2026 at 03:03:27PM +0100, Marco Elver wrote:
> On Mon, 4 May 2026 at 23:23, Marco Elver <[email protected]> wrote:
> > On Thu, Apr 30, 2026 at 03:03PM +0200, Vlastimil Babka (SUSE) wrote:
> > > On 4/24/26 15:24, Marco Elver wrote:
> > > > @@ -948,14 +978,16 @@ static __always_inline __alloc_size(1) void
> > > > *kmalloc_noprof(size_t size, gfp_t f
> > > >
> > > > index = kmalloc_index(size);
> > > > return __kmalloc_cache_noprof(
> > > > - kmalloc_caches[kmalloc_type(flags,
> > > > _RET_IP_)][index],
> > > > + kmalloc_caches[kmalloc_type(flags,
> > > > token)][index],
> > >
> > > While reviewing this, it occured to me we might have been using _RET_IP_
> > > here in a suboptimal way ever since this was introduced. Since this is all
> > > inlined, shouldn't have we been using _THIS_IP_ to really randomize using
> > > the kmalloc() callsite, and not its parent?
> > >
> > > And after this patch, we get the token passed to _kmalloc_noprof()...
> > >
> > > > flags, size);
> > > > }
> > > > - return __kmalloc_noprof(size, flags);
> > > > + return __kmalloc_noprof(PASS_KMALLOC_PARAMS(size, NULL, token),
> > > > flags);
> > >
> > > ... and used also here for the non-constant-size, where previously
> > > __kmalloc_noprof() (not inline function) would correctly use _RET_IP_ on
> > > its
> > > own ...
> > >
> > > > }
> > > > +#define kmalloc_noprof(...)
> > > > _kmalloc_noprof(__VA_ARGS__, __kmalloc_token(__VA_ARGS__))
> > >
> > > ... and the token comes from here. With random partitioning that's
> > > #define __kmalloc_token(...) ((kmalloc_token_t){ .v = _RET_IP_ })
> > >
> > > so that AFAIK makes the situation worse as now the cases without constant
> > > size also start randomizing by the parent callsite and not the kmalloc
> > > callsite.
> > >
> > > But there are many users of __kmalloc_token() and maybe some are corrent
> > > in
> > > using _RET_IP_, I haven't checked, maybe we'll need two variants, or
> > > further
> > > change things around.
> >
> > Good catch. I don't think we need multiple variants (otherwise the TYPED
> > variant would be broken) - we're moving token generation to the callers
> > (not even inlined anymore) with all this macro magic.
> >
> > I think this is all we need:
> >
> > --- a/include/linux/slab.h
> > +++ b/include/linux/slab.h
> > @@ -503,7 +503,7 @@ int kmem_cache_shrink(struct kmem_cache *s);
> > typedef struct { unsigned long v; } kmalloc_token_t;
> > #ifdef CONFIG_KMALLOC_PARTITION_RANDOM
> > extern unsigned long random_kmalloc_seed;
> > -#define __kmalloc_token(...) ((kmalloc_token_t){ .v = _RET_IP_ })
> > +#define __kmalloc_token(...) ((kmalloc_token_t){ .v = _THIS_IP_ })
> > #elif defined(CONFIG_KMALLOC_PARTITION_TYPED)
> > #define __kmalloc_token(...) ((kmalloc_token_t){ .v =
> > __builtin_infer_alloc_token(__VA_ARGS__) })
> > #endif
> >
> > Plus a paragraph in the commit message. Let me add that.
Err, I was like "yes, this is the way to go!"
and then...
> Bah, this is why it doesn't work:
>
> >> drivers/gpu/drm/msm/msm_gpu.c:272:4: error: cannot jump from this indirect
> >> goto statement to one of its possible targets
> 272 | drm_exec_retry_on_contention(&exec);
> | ^
> include/drm/drm_exec.h:123:4: note: expanded from macro
> 'drm_exec_retry_on_contention'
> 123 | goto *__drm_exec_retry_ptr; \
> | ^
> drivers/gpu/drm/msm/msm_gpu.c:304:16: note: possible target of
> indirect goto statement
> 304 | state->bos = kcalloc(submit->nr_bos,
> | ^
> include/linux/slab.h:1173:34: note: expanded from macro 'kcalloc'
> 1173 | #define kcalloc(n, size, flags) kmalloc_array(n,
> size, (flags) | __GFP_ZERO)
> | ^
> include/linux/slab.h:1133:42: note: expanded from macro 'kmalloc_array'
> 1133 | #define kmalloc_array(...)
> alloc_hooks(kmalloc_array_noprof(__VA_ARGS__))
> | ^
> include/linux/slab.h:1132:71: note: expanded from macro
> 'kmalloc_array_noprof'
> 1132 | #define kmalloc_array_noprof(...)
> _kmalloc_array_noprof(__VA_ARGS__, __kmalloc_token(__VA_ARGS__))
> |
> ^
> include/linux/slab.h:506:55: note: expanded from macro '__kmalloc_token'
> 506 | #define __kmalloc_token(...) ((kmalloc_token_t){ .v = _THIS_IP_ })
> | ^
> include/linux/instruction_pointer.h:10:41: note: expanded from
> macro '_THIS_IP_'
> 10 | #define _THIS_IP_ ({ __label__ __here; __here: (unsigned
> long)&&__here; })
> | ^
> drivers/gpu/drm/msm/msm_gpu.c:304:16: note: jump enters a statement
> expression
>
> Apparently using _THIS_IP_ creates a possible indirect jump target,
Didn't even realize people use indirect gotos, heh :)
> but because it's in a statement expression, it's invalid, so the
> compiler complains. This is obviously nonsense, because the actual
> indirect jump in this gpu driver code would never jump to the
> _THIS_IP_ __here label, but that's what it is.
Yeah, I guess it's quite tricky to handle when you don't know where
it'd jump to as it's an indirect one, and there's an invalid jump
label...
> Given this pre-existing issue, we probably need to continue using
> _RET_IP_, as before.
Agreed!
--
Cheers,
Harry / Hyeonggon