2025-04-23T17:00:29-07:00, Deepak Gupta <de...@rivosinc.com>: > On Thu, Apr 10, 2025 at 01:04:39PM +0200, Radim Krčmář wrote: >>2025-03-14T14:39:24-07:00, Deepak Gupta <de...@rivosinc.com>: >>> diff --git a/arch/riscv/include/asm/thread_info.h >>> b/arch/riscv/include/asm/thread_info.h >>> @@ -62,6 +62,9 @@ struct thread_info { >>> long user_sp; /* User stack pointer */ >>> int cpu; >>> unsigned long syscall_work; /* SYSCALL_WORK_ flags */ >>> +#ifdef CONFIG_RISCV_USER_CFI >>> + struct cfi_status user_cfi_state; >>> +#endif >> >>I don't think it makes sense to put all the data in thread_info. >>kernel_ssp and user_ssp is more than enough and the rest can comfortably >>live elsewhere in task_struct. >> >>thread_info is supposed to be as small as possible -- just spanning >>multiple cache-lines could be noticeable. > > I can change it to only include only `user_ssp`, base and size.
No need for base and size either -- we don't touch that in the common exception code. > But before we go there, see below: > > $ pahole -C thread_info kbuild/vmlinux > struct thread_info { > long unsigned int flags; /* 0 8 */ > int preempt_count; /* 8 4 */ > > /* XXX 4 bytes hole, try to pack */ > > long int kernel_sp; /* 16 8 */ > long int user_sp; /* 24 8 */ > int cpu; /* 32 4 */ > > /* XXX 4 bytes hole, try to pack */ > > long unsigned int syscall_work; /* 40 8 */ > struct cfi_status user_cfi_state; /* 48 32 */ > /* --- cacheline 1 boundary (64 bytes) was 16 bytes ago --- */ > long unsigned int a0; /* 80 8 */ > long unsigned int a1; /* 88 8 */ > long unsigned int a2; /* 96 8 */ > > /* size: 104, cachelines: 2, members: 10 */ > /* sum members: 96, holes: 2, sum holes: 8 */ > /* last cacheline: 40 bytes */ > }; > > If we were to remove entire `cfi_status`, it would still be 72 bytes (88 bytes > if shadow call stack were enabled) and already spans across two cachelines. It has only 64 bytes of data without shadow call stack, but it wasted 8 bytes on the holes. a2 is somewhat an outlier that is not used most exception paths and excluding it makes everything fit nicely even now. > if shadow call stack were enabled) and already spans across two cachelines. I > did see the comment above that it should fit inside a cacheline. Although I > assumed its stale comment given that it already spans across cacheline and I > didn't see any special mention in commit messages of changes which grew this > structure above one cacheline. So I assumed this was a stale comment. > > On the other hand, whenever enable/lock bits are checked, there is a high > likelyhood that user_ssp and other fields are going to be accessed and > thus it actually might be helpful to have it all in one cacheline during > runtime. Yes, although accessing enable/lock bits will be relatively rare. It seems better to have the overhead during thread setup, rather than on every trap. > So I am not sure if its helpful sticking to the comment which already is > stale. We could fix the holes and also use sp instead of a0 in the new_vmalloc_check, so everything would fit better. We are really close to fitting into a single cache-line, so I'd prefer if shadow stack only filled thread_info with data that is used very often in the exception handling code. I think we could do without user_sp in thread_info as well, so there are other packing options. Btw. could ssp be added to pt_regs? Thanks.