On Mon, May 18, 2020 at 02:23:47PM +0100, Will Deacon wrote: > On Mon, May 18, 2020 at 01:12:10PM +0100, Mark Rutland wrote: > > On Fri, May 15, 2020 at 06:27:54PM +0100, Will Deacon wrote: > > > There is nothing architecture-specific about scs_overflow_check() as > > > it's just a trivial wrapper around scs_corrupted(). > > > > > > For parity with task_stack_end_corrupted(), rename scs_corrupted() to > > > task_scs_end_corrupted() and call it from schedule_debug() when > > > CONFIG_SCHED_STACK_END_CHECK_is enabled. Finally, remove the unused > > > scs_overflow_check() function entirely. > > > > > > This has absolutely no impact on architectures that do not support SCS > > > (currently arm64 only). > > > > > > Signed-off-by: Will Deacon <[email protected]> > > > > Pulling this out of arch code seems sane to me, and the arch-specific > > chanes look sound. However, I have a concern with the changes within the > > scheduler context-switch. > > > > > diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c > > > index a35d3318492c..56be4cbf771f 100644 > > > --- a/arch/arm64/kernel/process.c > > > +++ b/arch/arm64/kernel/process.c > > > @@ -52,7 +52,6 @@ > > > #include <asm/mmu_context.h> > > > #include <asm/processor.h> > > > #include <asm/pointer_auth.h> > > > -#include <asm/scs.h> > > > #include <asm/stacktrace.h> > > > > > > #if defined(CONFIG_STACKPROTECTOR) && > > > !defined(CONFIG_STACKPROTECTOR_PER_TASK) > > > @@ -516,7 +515,6 @@ __notrace_funcgraph struct task_struct > > > *__switch_to(struct task_struct *prev, > > > entry_task_switch(next); > > > uao_thread_switch(next); > > > ssbs_thread_switch(next); > > > - scs_overflow_check(next); > > > > Prior to this patch, we'd never switch to a task whose SCS had already > > been corrupted. > > > > With this patch, we only check that when switching away from a task, and > > only when CONFIG_SCHED_STACK_END_CHECK is selected, which at first > > glance seems to weaken that. > > Yes, ignoring vmap'd stacks, this patch brings the SCS checking in-line with > the main stack checking when CONFIG_SCHED_STACK_END_CHECK=y. > > > Arguably: > > > > * If the next task's SCS was corrupted by that task while it was > > running, we had already lost at that point. > > With this change, we'll at least catch this one sooner, and that might be > useful if a bug has caused us to overflow the SCS but not the main stack.
Sure, but only if CONFIG_SCHED_STACK_END_CHECK is selected. > > * If the next task's SCS was corrupted by another task, then that could > > also happen immediately after the check (though timing to avoid the > > check but affect the process could be harder). > > We're only checking the magic end value, so the cross-task case is basically > if you overrun your own SCS as above, but then continue to overrun entire > SCSs for other tasks as well. It's probably not very useful in that case. > > > ... and a VMAP'd SCS would be much nicer in this regard. > > > > Do we think this is weakening the check, or do we think it wasn't all > > that helpful to begin with? > > I see it as a debug check to catch SCS overflow, rather than a hardening > feature, and I agree that using something like vmap stack for the SCS would > be better because we could have a guard page instead. Fair enough. Could we put something into the commit message that more explicitly calls out debug-not-hardening? I agree that under that model this patch looks fine, and with something to that effect: Reviewed-by: Mark Rutland <[email protected]> Mark. > This is something I would like to revisit, but we need more > information from Sami about why Android rejected the larger allocation > size, since I don't think there's an awful lot of point merging this > series if Android doesn't pick it up. Indeed. I'd certainly prefer the robustness of a VMAP'd SCS if we can do that. Mark.

