On Tue, Nov 28, 2017 at 11:09 PM, Ingo Molnar <mi...@kernel.org> wrote: > > * Jarkko Nikula <jarkko.nik...@linux.intel.com> wrote: > >> Hi >> >> Suspend-to-ram and resume stopped working on v4.15-rc1 and I bisected it to >> commit ca37e57bbe0c ("x86/entry/64: Add missing irqflags tracing to >> native_load_gs_index()"). >> >> I noticed it on Intel Kabylake (core) and Apollolake (atom) based prototype >> machines. Symptoms are that machine appears to enter into suspend but >> resumes instantly and hangs. Unfortunately no logs. >> >> If I revert ca37e57bbe0c on v4.15-rc1 it works as expected. > > Hm, that commit looks broken with irq-tracing enabled. > Does the patch below fix it? > > In fact the exception handler itself appears to have broken GS handling as > well - > I suspect it never triggers in practice, because it was broken forever. > > Andy, do you concur?
No. > > On a related note, we should definitely extend the 'intended GS state' > annotation > comments I did in this patch to all SWAPGS instances - this way code review > has a > much higher chance of finding discrepancies between intent and actual code. Agreed. I'll send a patch. > --- a/arch/x86/entry/entry_64.S > +++ b/arch/x86/entry/entry_64.S > @@ -945,16 +945,16 @@ idtentry simd_coprocessor_error > do_simd_coprocessor_error has_error_code=0 > */ > ENTRY(native_load_gs_index) > FRAME_BEGIN > + SWAPGS /* switch from user GS to > kernel GS */ No, we start with kernel GS. It was correct before. > pushfq > DISABLE_INTERRUPTS(CLBR_ANY & ~CLBR_RDI) > TRACE_IRQS_OFF > - SWAPGS > .Lgs_change: > movl %edi, %gs > 2: ALTERNATIVE "", "mfence", X86_BUG_SWAPGS_FENCE > - SWAPGS > TRACE_IRQS_FLAGS (%rsp) > popfq > + SWAPGS /* switch from kernel GS to > user GS */ > FRAME_END > ret > ENDPROC(native_load_gs_index) > @@ -964,7 +964,7 @@ EXPORT_SYMBOL(native_load_gs_index) > .section .fixup, "ax" > /* running with kernelgs */ > bad_gs: > - SWAPGS /* switch back to user gs */ > + SWAPGS /* switch back to user GS, to > modify GS */ > .macro ZAP_GS > /* This can't be a string because the preprocessor needs to see it. */ > movl $__USER_DS, %eax > @@ -973,6 +973,7 @@ EXPORT_SYMBOL(native_load_gs_index) > ALTERNATIVE "", "ZAP_GS", X86_BUG_NULL_SEG > xorl %eax, %eax > movl %eax, %gs > + SWAPGS /* switch to kernel GS again > before continuing */ Which we don't want to do because the landing site expects user GS. I suspect we're hitting an entirely different bug, that we're blowing up if we WARN too early in resume.