control: reassign -1 src:linux control: retitle -1 linux: cleared PKRU register after signal in amd64 32-bit compat mode control: close -1 4.10-1~exp1 control: affects -1 glibc
Hi, On 2019-03-28 22:05, Aurelien Jarno wrote: > Hi Florian, > > On 2019-03-27 23:59, Florian Weimer wrote: > > retitle 924891 glibc: misc/tst-pkey fails due to cleared PKRU register > > after signal in amd64 32-bit compat mode > > thanks > > > > * Lucas Nussbaum: > > > > > On 27/03/19 at 08:48 +0100, Florian Weimer wrote: > > >> > If that's useful, I can easily provide access to an AWS VM to debug > > >> > this > > >> > issue. > > >> > > >> Oh, that would be quite helpful indeed. > > > > > > Can you send your SSH key? (I thought there was a way to get the SSH key > > > for a DD, but I cannot find it anymore) > > > > > > Then you will be able to ssh to root@18.184.55.40. > > > There's sbuild and schroot setup on the VM. > > > > > > When you are done, please 'poweroff' the machine, which will terminate > > > it. > > > > The issue reproduces outside the chroot, with the stretch userland. > > > > What happens is that once we get out of the SIGUSR1 signal handler, > > the PKRU register has value zero. This happens around this code in > > the test: > > > > /* Check that in a signal handler, there is no access. */ > > xsignal (SIGUSR1, &sigusr1_handler); > > xraise (SIGUSR1); > > xsignal (SIGUSR1, SIG_DFL); > > TEST_COMPARE (sigusr1_handler_ran, 1); > > > > I checked the following (via a breakpoint in pkey_get; I don't think > > GDB can read the PKRU register directly): Inside the SIGUSR1 signal > > handler, PKRU has value 0x55555554, as expected for this kernel, but > > after the return, we get zero. This is the first time a signal is > > delivered on the main thread, so it's consistent with fairly broken > > signal handling as far as the PKRU register is concerned. I guess > > clearing PKRU in this way might even constitute a minor security bug > > (because the zero value means no restrictions). > > Thanks a lot for investigating and for all the details. > > > This commit looks highly relevant: > > > > commit a4455082dc6f0b5d51a23523f77600e8ede47c79 > > Author: Dave Hansen <dave.han...@linux.intel.com> > > Date: Wed Jun 8 10:25:33 2016 -0700 > > > > x86/signals: Add missing signal_compat code for x86 features > > > > The 32-bit siginfo is a different binary format than the 64-bit > > one. So, when running 32-bit binaries on 64-bit kernels, we have > > to convert the kernel's 64-bit version to a 32-bit version that > > userspace can grok. > > > > If the siginfo_t layout is incorrect (with regards to what the > > hardware writes), I expect that we might end up copying back the wrong > > PKRU value. > > This commit is actually already in the 4.9 kernel. > > > I'm not sure what to do here. This really looks like a kernel bug. > > Maybe we should just verify that this is fixed in the buster kernel > > and move on? > > I agree. I have been able to find a machine where I can temporarily run > a VM. I have found that the problem has been solved between kernel > 4.10-rc6 and 4.10, more precisely between the following debian packages: > - linux-image-4.10.0-rc6-amd64-unsigned version 4.10~rc6-1~exp1 > - linux-image-4.10.0-trunk-amd64-unsigned version 4.10-1~exp1 > > I gave a quick look at the commit logs, and I haven't identified a > commit. I'll look again and try to identify the commit fixing the issue > so that it can be backported in the stretch kernel. I'll then reassign > the bug there. I have identified that the issue has been fixed by the following commit that went into kernel 4.10-rc7: commit dffba9a31c7769be3231c420d4b364c92ba3f1ac Author: Yu-cheng Yu <yu-cheng...@intel.com> Date: Mon Jan 23 14:54:44 2017 -0800 x86/fpu/xstate: Fix xcomp_bv in XSAVES header The compacted-format XSAVES area is determined at boot time and never changed after. The field xsave.header.xcomp_bv indicates which components are in the fixed XSAVES format. In fpstate_init() we did not set xcomp_bv to reflect the XSAVES format since at the time there is no valid data. However, after we do copy_init_fpstate_to_fpregs() in fpu__clear(), as in commit: b22cbe404a9c x86/fpu: Fix invalid FPU ptrace state after execve() and when __fpu_restore_sig() does fpu__restore() for a COMPAT-mode app, a #GP occurs. This can be easily triggered by doing valgrind on a COMPAT-mode "Hello World," as reported by Joakim Tjernlund and others: https://bugzilla.kernel.org/show_bug.cgi?id=190061 Fix it by setting xcomp_bv correctly. This patch also moves the xcomp_bv initialization to the proper place, which was in copyin_to_xsaves() as of: 4c833368f0bf x86/fpu: Set the xcomp_bv when we fake up a XSAVES area which fixed the bug too, but it's more efficient and cleaner to initialize things once per boot, not for every signal handling operation. Reported-by: Kevin Hao <haoke...@gmail.com> Reported-by: Joakim Tjernlund <joakim.tjernl...@infinera.com> Signed-off-by: Yu-cheng Yu <yu-cheng...@intel.com> Cc: Andy Lutomirski <l...@kernel.org> Cc: Borislav Petkov <b...@suse.de> Cc: Dave Hansen <dave.han...@linux.intel.com> Cc: Fenghua Yu <fenghua...@intel.com> Cc: Linus Torvalds <torva...@linux-foundation.org> Cc: Peter Zijlstra <pet...@infradead.org> Cc: Ravi V. Shankar <ravi.v.shan...@intel.com> Cc: Thomas Gleixner <t...@linutronix.de> Cc: haoke...@gmail.com Link: http://lkml.kernel.org/r/1485212084-4418-1-git-send-email-yu-cheng...@intel.com [ Combined it with 4c833368f0bf. ] Signed-off-by: Ingo Molnar <mi...@kernel.org> This actually makes sense given that on x86 the protection keys register is saved and restored using XSAVE. I am therefore reassigning the bug to the kernel, and closing it given it has been fixed a lot of time ago. -- Aurelien Jarno GPG: 4096R/1DDD8C9B aurel...@aurel32.net http://www.aurel32.net