On Thu, 4 Mar 2021 at 19:02, Mark Rutland <mark.rutl...@arm.com> wrote: > On Thu, Mar 04, 2021 at 06:25:33PM +0100, Marco Elver wrote: > > On Thu, Mar 04, 2021 at 04:59PM +0000, Mark Rutland wrote: > > > On Thu, Mar 04, 2021 at 04:30:34PM +0100, Marco Elver wrote: > > > > On Thu, 4 Mar 2021 at 15:57, Mark Rutland <mark.rutl...@arm.com> wrote: > > > > > [adding Mark Brown] > > > > > > > > > > The bigger problem here is that skipping is dodgy to begin with, and > > > > > this is still liable to break in some cases. One big concern is that > > > > > (especially with LTO) we cannot guarantee the compiler will not inline > > > > > or outline functions, causing the skipp value to be too large or too > > > > > small. That's liable to happen to callers, and in theory (though > > > > > unlikely in practice), portions of arch_stack_walk() or > > > > > stack_trace_save() could get outlined too. > > > > > > > > > > Unless we can get some strong guarantees from compiler folk such that > > > > > we > > > > > can guarantee a specific function acts boundary for unwinding (and > > > > > doesn't itself get split, etc), the only reliable way I can think to > > > > > solve this requires an assembly trampoline. Whatever we do is liable > > > > > to > > > > > need some invasive rework. > > > > > > > > Will LTO and friends respect 'noinline'? > > > > > > I hope so (and suspect we'd have more problems otherwise), but I don't > > > know whether they actually so. > > > > > > I suspect even with 'noinline' the compiler is permitted to outline > > > portions of a function if it wanted to (and IIUC it could still make > > > specialized copies in the absence of 'noclone'). > > > > > > > One thing I also noticed is that tail calls would also cause the stack > > > > trace to appear somewhat incomplete (for some of my tests I've > > > > disabled tail call optimizations). > > > > > > I assume you mean for a chain A->B->C where B tail-calls C, you get a > > > trace A->C? ... or is A going missing too? > > > > Correct, it's just the A->C outcome. > > I'd assumed that those cases were benign, e.g. for livepatching what > matters is what can be returned to, so B disappearing from the trace > isn't a problem there. > > Is the concern debugability, or is there a functional issue you have in > mind?
For me, it's just been debuggability, and reliable test cases. > > > > Is there a way to also mark a function non-tail-callable? > > > > > > I think this can be bodged using __attribute__((optimize("$OPTIONS"))) > > > on a caller to inhibit TCO (though IIRC GCC doesn't reliably support > > > function-local optimization options), but I don't expect there's any way > > > to mark a callee as not being tail-callable. > > > > I don't think this is reliable. It'd be > > __attribute__((optimize("-fno-optimize-sibling-calls"))), but doesn't > > work if applied to the function we do not want to tail-call-optimize, > > but would have to be applied to the function that does the tail-calling. > > Yup; that's what I meant then I said you could do that on the caller but > not the callee. > > I don't follow why you'd want to put this on the callee, though, so I > think I'm missing something. Considering a set of functions in different > compilation units: > > A->B->C->D->E->F->G->H->I->J->K I was having this problem with KCSAN, where the compiler would tail-call-optimize __tsan_X instrumentation. This would mean that KCSAN runtime functions ended up in the trace, but the function where the access happened would not. However, I don't care about the runtime functions, and instead want to see the function where the access happened. In that case, I'd like to just mark __tsan_X and any other kcsan instrumentation functions as do-not-tail-call-optimize, which would solve the problem. The solution today is that when you compile a kernel with KCSAN, every instrumented TU is compiled with -fno-optimize-sibling-calls. The better solution would be to just mark KCSAN runtime functions somehow, but permit tail calling other things. Although, I probably still want to see the full trace, and would decide that having -fno-optimize-sibling-calls is a small price to pay in a debug-only-kernel to get complete traces. > ... if K were marked in this way, and J was compiled with visibility of > this, J would stick around, but J's callers might not, and so the a > trace might see: > > A->J->K > > ... do you just care about the final caller, i.e. you just need > certainty that J will be in the trace? Yes. But maybe it's a special problem that only sanitizers have. > If so, we can somewhat bodge that by having K have an __always_inline > wrapper which has a barrier() or similar after the real call to K, so > the call couldn't be TCO'd. > > Otherwise I'd expect we'd probably need to disable TCO generally. Thanks, -- Marco