RE: [RFC PATCH] x86, entry: Switch stacks on a paranoid entry from userspace

2014-11-18 Thread Luck, Tony
> Your test case is presumably doing something that involves setting > undocumented registers* to program the CPU or memory controller to > generate a machine check on access to some address. Presumably this > is done by broadcasting an SMI and programming the registers in SMM. Good theory - but

Re: [RFC PATCH] x86, entry: Switch stacks on a paranoid entry from userspace

2014-11-18 Thread Andy Lutomirski
On Tue, Nov 18, 2014 at 10:30 AM, Luck, Tony wrote: >>> The lost cpu is *really* lost. Warm reset doesn't fix the machine, I >>> usually >>> have to do a full power cycle. > >> How is it even possible that I did that with a few lines of asm? > > Probably not your directly your fault - some casca

RE: [RFC PATCH] x86, entry: Switch stacks on a paranoid entry from userspace

2014-11-18 Thread Luck, Tony
>> The lost cpu is *really* lost. Warm reset doesn't fix the machine, I usually >> have to do a full power cycle. > How is it even possible that I did that with a few lines of asm? Probably not your directly your fault - some cascade of errors may have occurred. > Could this be a hardware bug?

Re: [RFC PATCH] x86, entry: Switch stacks on a paranoid entry from userspace

2014-11-18 Thread Borislav Petkov
On Mon, Nov 17, 2014 at 12:05:59PM -0800, Andy Lutomirski wrote: > https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/log/?h=x86/paranoid > > I'm not quite ready to send v3. I want to do two things first: > > 1. Consider disabling the stack switch for double_fault. Sounds conservativel

Re: [RFC PATCH] x86, entry: Switch stacks on a paranoid entry from userspace

2014-11-17 Thread Andy Lutomirski
On Mon, Nov 17, 2014 at 4:22 PM, Luck, Tony wrote: >> It could also be interesting to tweak mce_panic to not actually panic >> the machine but to try to return and stop the test instead. Then real >> debugging could be possible :) > > The lost cpu is *really* lost. Warm reset doesn't fix the mac

RE: [RFC PATCH] x86, entry: Switch stacks on a paranoid entry from userspace

2014-11-17 Thread Luck, Tony
> It could also be interesting to tweak mce_panic to not actually panic > the machine but to try to return and stop the test instead. Then real > debugging could be possible :) The lost cpu is *really* lost. Warm reset doesn't fix the machine, I usually have to do a full power cycle. -Tony N���

Re: [RFC PATCH] x86, entry: Switch stacks on a paranoid entry from userspace

2014-11-17 Thread Andy Lutomirski
On Mon, Nov 17, 2014 at 3:16 PM, Luck, Tony wrote: >> I still wonder whether the timeout code is the real culprit. My patch >> will slow down entry into do_machine_check by tens of cycles, several >> cachelines, and possibly a couple of TLB misses. Given that the >> timing seemed marginal to me,

RE: [RFC PATCH] x86, entry: Switch stacks on a paranoid entry from userspace

2014-11-17 Thread Luck, Tony
> I still wonder whether the timeout code is the real culprit. My patch > will slow down entry into do_machine_check by tens of cycles, several > cachelines, and possibly a couple of TLB misses. Given that the > timing seemed marginal to me, it's possible (albeit not that likely) > that it pushed

Re: [RFC PATCH] x86, entry: Switch stacks on a paranoid entry from userspace

2014-11-17 Thread Andy Lutomirski
On Mon, Nov 17, 2014 at 1:55 PM, Luck, Tony wrote: >>> However, I'd like to be very sure this thing doesn't introduce any >>> regressions to the MCA code. So even if Tony's testing passes, I'd like >>> to be very conservative here and stress it more than usual. Because once >>> this thing hits ups

RE: [RFC PATCH] x86, entry: Switch stacks on a paranoid entry from userspace

2014-11-17 Thread Luck, Tony
>> However, I'd like to be very sure this thing doesn't introduce any >> regressions to the MCA code. So even if Tony's testing passes, I'd like >> to be very conservative here and stress it more than usual. Because once >> this thing hits upstream and stuff starts breaking, it'll be a serious >> P

Re: [RFC PATCH] x86, entry: Switch stacks on a paranoid entry from userspace

2014-11-17 Thread Andy Lutomirski
On Mon, Nov 17, 2014 at 12:03 PM, Borislav Petkov wrote: > On Mon, Nov 17, 2014 at 11:57:22AM -0800, Andy Lutomirski wrote: >> Would it be worth making a decision on task_work_add vs. stack >> switching first? > > Probably a prudent thing to do in order to save unnecessary cycles :-) > >> Stack sw

Re: [RFC PATCH] x86, entry: Switch stacks on a paranoid entry from userspace

2014-11-17 Thread Borislav Petkov
On Mon, Nov 17, 2014 at 11:57:22AM -0800, Andy Lutomirski wrote: > Would it be worth making a decision on task_work_add vs. stack > switching first? Probably a prudent thing to do in order to save unnecessary cycles :-) > Stack switching pros: all this lockless allocation stuff is completely > un

Re: [RFC PATCH] x86, entry: Switch stacks on a paranoid entry from userspace

2014-11-17 Thread Andy Lutomirski
On Mon, Nov 17, 2014 at 10:50 AM, Borislav Petkov wrote: > On Fri, Nov 14, 2014 at 09:56:38PM +, Luck, Tony wrote: >> ... >> But I think that means we need more than one of these structures ... >> we may not be done with one before a new machine check occurs. So >> we'd have to make an NMI-saf

Re: [RFC PATCH] x86, entry: Switch stacks on a paranoid entry from userspace

2014-11-17 Thread Borislav Petkov
On Fri, Nov 14, 2014 at 09:56:38PM +, Luck, Tony wrote: > ... > But I think that means we need more than one of these structures ... > we may not be done with one before a new machine check occurs. So > we'd have to make an NMI-safe allocator to grab one for use inside > do_machine_check() Wel

Re: [RFC PATCH] x86, entry: Switch stacks on a paranoid entry from userspace

2014-11-14 Thread Andy Lutomirski
On Fri, Nov 14, 2014 at 1:56 PM, Luck, Tony wrote: >>> Right, I can do it in the meantime and we can always experiment more >>> later. Getting rid of _TIF_MCE_NOTIFY is a good thing already. >> >> Yep, it looks pretty simple - not tested yet, it builds though. > > It seems pretty solid under test

RE: [RFC PATCH] x86, entry: Switch stacks on a paranoid entry from userspace

2014-11-14 Thread Luck, Tony
>> Right, I can do it in the meantime and we can always experiment more >> later. Getting rid of _TIF_MCE_NOTIFY is a good thing already. > > Yep, it looks pretty simple - not tested yet, it builds though. It seems pretty solid under test so far. Can we make it pass the address/flag to mce_notify

RE: [RFC PATCH] x86, entry: Switch stacks on a paranoid entry from userspace

2014-11-14 Thread Luck, Tony
> So far, the only thing I've come up with is that do_machine_check > seems to be missing exception_enter or the equivalent. Do you have > CONFIG_CONTEXT_TRACKING on and/or full nohz enabled? I don't think > that this explains my bug, though. Yes to both: $ grep CONTEXT_TRACK .config CONFIG_CON

Re: [RFC PATCH] x86, entry: Switch stacks on a paranoid entry from userspace

2014-11-14 Thread Andy Lutomirski
On Fri, Nov 14, 2014 at 9:49 AM, Luck, Tony wrote: >> Can you also try rebasing onto what will probably be v3? >> >> https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/tag/?id=paranoid-stack-v2.9 > > Built that - with none of my other changes ... i.e. still use TIF_NOTIFY_MCE > etc. No p

Re: [RFC PATCH] x86, entry: Switch stacks on a paranoid entry from userspace

2014-11-14 Thread Borislav Petkov
On Fri, Nov 14, 2014 at 09:26:26AM -0800, Andy Lutomirski wrote: > I was hoping for an actual worked-out example of what the parameters > should be :) Sorry, I haven't played with this myself either - haven't had a box with EINJ yet. Maybe Tony has something. -- Regards/Gruss, Boris. Sent f

RE: [RFC PATCH] x86, entry: Switch stacks on a paranoid entry from userspace

2014-11-14 Thread Luck, Tony
>> It adds debugging for inappropriate reschedules from the wrong stack. >> Setting CONFIG_DEBUG_ATOMIC_SLEEP might also be a good idea. > > Will add that for next build/test Didn't see anything new. System died at 1108 recoveries with the "Timeout synchronization ..." panic -Tony N�r��y

RE: [RFC PATCH] x86, entry: Switch stacks on a paranoid entry from userspace

2014-11-14 Thread Luck, Tony
> Can you also try rebasing onto what will probably be v3? > > https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/tag/?id=paranoid-stack-v2.9 Built that - with none of my other changes ... i.e. still use TIF_NOTIFY_MCE etc. No printk() in the MCE context. System ran 736 injection/consum

Re: [RFC PATCH] x86, entry: Switch stacks on a paranoid entry from userspace

2014-11-14 Thread Andy Lutomirski
On Fri, Nov 14, 2014 at 9:24 AM, Borislav Petkov wrote: > On Fri, Nov 14, 2014 at 09:18:51AM -0800, Andy Lutomirski wrote: >> Grr. Do you or Tony have any pointers for how to test this myself? I >> don't know enough about the acpi error injection thing, which I assume >> is that Tony is using. > >

Re: [RFC PATCH] x86, entry: Switch stacks on a paranoid entry from userspace

2014-11-14 Thread Borislav Petkov
On Fri, Nov 14, 2014 at 09:18:51AM -0800, Andy Lutomirski wrote: > Grr. Do you or Tony have any pointers for how to test this myself? I > don't know enough about the acpi error injection thing, which I assume > is that Tony is using. Maybe that would help: Documentation/acpi/apei/einj.txt provid

Re: [RFC PATCH] x86, entry: Switch stacks on a paranoid entry from userspace

2014-11-14 Thread Andy Lutomirski
On Nov 14, 2014 2:34 AM, "Borislav Petkov" wrote: > > On Wed, Nov 12, 2014 at 07:03:21PM -0800, Andy Lutomirski wrote: > > printk seems to work just fine in do_machine_check. > > That must be pure luck. Has anything changed which I missed to make > printk NMI-safe? Heh. Probably not. Now I wond

Re: [RFC PATCH] x86, entry: Switch stacks on a paranoid entry from userspace

2014-11-14 Thread Borislav Petkov
On Wed, Nov 12, 2014 at 07:03:21PM -0800, Andy Lutomirski wrote: > printk seems to work just fine in do_machine_check. That must be pure luck. Has anything changed which I missed to make printk NMI-safe? -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- --

Re: [RFC PATCH] x86, entry: Switch stacks on a paranoid entry from userspace

2014-11-13 Thread Andy Lutomirski
On Thu, Nov 13, 2014 at 5:20 PM, Luck, Tony wrote: > "worst == > MCE_AR_SEVERITY but regs->cs == 0 (i.e. in kernel)" > > This can't happen. We can only declare AR severity for a user mode fault. I believe you, and I see that in the code, but the code is mightily twisted. Anyway, my v3 will also

Re: [RFC PATCH] x86, entry: Switch stacks on a paranoid entry from userspace

2014-11-13 Thread Luck, Tony
"worst == MCE_AR_SEVERITY but regs->cs == 0 (i.e. in kernel)" This can't happen. We can only declare AR severity for a user mode fault. Sent from my iPhone > On Nov 13, 2014, at 16:50, Andy Lutomirski wrote: > > worst == > MCE_AR_SEVERITY but regs->cs == 0 (i.e. in kernel) -- To unsubscribe fr

Re: [RFC PATCH] x86, entry: Switch stacks on a paranoid entry from userspace

2014-11-13 Thread Andy Lutomirski
On Thu, Nov 13, 2014 at 3:13 PM, Andy Lutomirski wrote: > On Thu, Nov 13, 2014 at 2:47 PM, Andy Lutomirski wrote: >> On Thu, Nov 13, 2014 at 2:33 PM, Luck, Tony wrote: Are you sure that this works in an unmodified kernel >>> >>> Unmodified kernel has run tens of thousands of >>> injection/

Re: [RFC PATCH] x86, entry: Switch stacks on a paranoid entry from userspace

2014-11-13 Thread Andy Lutomirski
On Thu, Nov 13, 2014 at 2:47 PM, Andy Lutomirski wrote: > On Thu, Nov 13, 2014 at 2:33 PM, Luck, Tony wrote: >>> Are you sure that this works in an unmodified kernel >> >> Unmodified kernel has run tens of thousands of >> injection/consumption/recovery cycles. >> >> I did get a crash with the en

Re: [RFC PATCH] x86, entry: Switch stacks on a paranoid entry from userspace

2014-11-13 Thread Andy Lutomirski
On Thu, Nov 13, 2014 at 2:33 PM, Luck, Tony wrote: >> Are you sure that this works in an unmodified kernel > > Unmodified kernel has run tens of thousands of injection/consumption/recovery > cycles. > > I did get a crash with the entry/exit traces you asked for. Last 2 lines > of console lo

RE: [RFC PATCH] x86, entry: Switch stacks on a paranoid entry from userspace

2014-11-13 Thread Luck, Tony
> Are you sure that this works in an unmodified kernel Unmodified kernel has run tens of thousands of injection/consumption/recovery cycles. I did get a crash with the entry/exit traces you asked for. Last 2 lines of console log attached. There are a couple of OOPs before things fall apar

Re: [RFC PATCH] x86, entry: Switch stacks on a paranoid entry from userspace

2014-11-13 Thread Andy Lutomirski
On Thu, Nov 13, 2014 at 2:23 PM, Andy Lutomirski wrote: > On Thu, Nov 13, 2014 at 10:43 AM, Luck, Tony wrote: >>> printk seems to work just fine in do_machine_check. Any chance you >>> can instrument, for each cpu, all entries to do_machine_check, all >>> calls to do_machine_check, all returns,

Re: [RFC PATCH] x86, entry: Switch stacks on a paranoid entry from userspace

2014-11-13 Thread Andy Lutomirski
On Thu, Nov 13, 2014 at 10:43 AM, Luck, Tony wrote: >> printk seems to work just fine in do_machine_check. Any chance you >> can instrument, for each cpu, all entries to do_machine_check, all >> calls to do_machine_check, all returns, and everything that tries to >> do memory_failure? > > I first

Re: [RFC PATCH] x86, entry: Switch stacks on a paranoid entry from userspace

2014-11-13 Thread Borislav Petkov
On Thu, Nov 13, 2014 at 11:59:37AM +0100, Borislav Petkov wrote: > I've been thinking about it recently too - adding MCA functionality to > qemu/kvm could be very useful, especially the thresholding stuff, for > testing RAS kernel code. Btw, qemu monitor has a mce injection command with which I wa

RE: [RFC PATCH] x86, entry: Switch stacks on a paranoid entry from userspace

2014-11-13 Thread Luck, Tony
> printk seems to work just fine in do_machine_check. Any chance you > can instrument, for each cpu, all entries to do_machine_check, all > calls to do_machine_check, all returns, and everything that tries to > do memory_failure? I first added a printk() just for the cpu that calls do_machine_che

Re: [RFC PATCH] x86, entry: Switch stacks on a paranoid entry from userspace

2014-11-13 Thread Borislav Petkov
On Wed, Nov 12, 2014 at 05:22:25PM +0100, Borislav Petkov wrote: > > Less intrusive is certainly true. > > Right, I can do it in the meantime and we can always experiment more > later. Getting rid of _TIF_MCE_NOTIFY is a good thing already. Yep, it looks pretty simple - not tested yet, it builds

Re: [RFC PATCH] x86, entry: Switch stacks on a paranoid entry from userspace

2014-11-13 Thread Borislav Petkov
On Thu, Nov 13, 2014 at 12:31:30AM +, Luck, Tony wrote: > > Is this something I can try under KVM? > > I don't know if KVM has a way to simulate a machine check event. I've been thinking about it recently too - adding MCA functionality to qemu/kvm could be very useful, especially the threshol

Re: [RFC PATCH] x86, entry: Switch stacks on a paranoid entry from userspace

2014-11-12 Thread Andy Lutomirski
On Wed, Nov 12, 2014 at 4:31 PM, Luck, Tony wrote: >> v2's not going to make a difference unless you're using uprobes at the >> same time. > > Not (knowingly) using uprobes. System is installed with a RHEL7 userspace ... > but is essentially > idle except for my test program. > >> In the interest

Re: [RFC PATCH] x86, entry: Switch stacks on a paranoid entry from userspace

2014-11-12 Thread Andy Lutomirski
On Wed, Nov 12, 2014 at 4:31 PM, Luck, Tony wrote: >> v2's not going to make a difference unless you're using uprobes at the >> same time. > > Not (knowingly) using uprobes. System is installed with a RHEL7 userspace ... > but is essentially > idle except for my test program. > >> In the interest

RE: [RFC PATCH] x86, entry: Switch stacks on a paranoid entry from userspace

2014-11-12 Thread Luck, Tony
> v2's not going to make a difference unless you're using uprobes at the > same time. Not (knowingly) using uprobes. System is installed with a RHEL7 userspace ... but is essentially idle except for my test program. > In the interest of my sanity, can you add something like > BUG_ON(!user_mode_v

Re: [RFC PATCH] x86, entry: Switch stacks on a paranoid entry from userspace

2014-11-12 Thread Andy Lutomirski
On Wed, Nov 12, 2014 at 3:41 PM, Luck, Tony wrote: >> v2 coming soon with these changes and some additional comment cleanups. > v2's not going to make a difference unless you're using uprobes at the same time. > So v1 + do_machine_check change is not surviving some real testing. I'm > injectin

RE: [RFC PATCH] x86, entry: Switch stacks on a paranoid entry from userspace

2014-11-12 Thread Luck, Tony
> v2 coming soon with these changes and some additional comment cleanups. So v1 + do_machine_check change is not surviving some real testing. I'm injecting and consuming errors sequentially with a small delay in between - so no fancy corner cases with multiple errors being processed ... we get

Re: [RFC PATCH] x86, entry: Switch stacks on a paranoid entry from userspace

2014-11-12 Thread Andy Lutomirski
On Wed, Nov 12, 2014 at 2:00 PM, Oleg Nesterov wrote: > Andy, > > As I said many times I do not understand asm ;) so most probably I missed > something but let me ask anyway. You must be the most competent non-asm-speaking asm reviewer in the world :) > > On 11/11, Andy Lutomirski wrote: >> >> -

Re: [RFC PATCH] x86, entry: Switch stacks on a paranoid entry from userspace

2014-11-12 Thread Oleg Nesterov
Andy, As I said many times I do not understand asm ;) so most probably I missed something but let me ask anyway. On 11/11, Andy Lutomirski wrote: > > --- a/arch/x86/kernel/entry_64.S > +++ b/arch/x86/kernel/entry_64.S > @@ -1064,6 +1064,9 @@ ENTRY(\sym) > CFI_ADJUST_CFA_OFFSET ORIG_RAX-R15

Re: [RFC PATCH] x86, entry: Switch stacks on a paranoid entry from userspace

2014-11-12 Thread Borislav Petkov
On Wed, Nov 12, 2014 at 05:17:55PM +, Luck, Tony wrote: > > Not that easy for testing the #MC path - there we have to inject real > > MCEs and then noodle through the memory_failure() code. I'd be very much > > interested to see what would happen if two MCEs happen back-to-back with > > your ch

RE: [RFC PATCH] x86, entry: Switch stacks on a paranoid entry from userspace

2014-11-12 Thread Luck, Tony
> Not that easy for testing the #MC path - there we have to inject real > MCEs and then noodle through the memory_failure() code. I'd be very much > interested to see what would happen if two MCEs happen back-to-back with > your change, the second one being raised when we're on the kernel stack > a

Re: [RFC PATCH] x86, entry: Switch stacks on a paranoid entry from userspace

2014-11-12 Thread Borislav Petkov
On Wed, Nov 12, 2014 at 07:48:15AM -0800, Andy Lutomirski wrote: > I only switch stacks on entry from userspace, and the kernel stack is > completely empty if that happens. Ok, fair enough. There's still the argument that something might've corrupted the kernel stack memory while the MCE_STACK is

Re: [RFC PATCH] x86, entry: Switch stacks on a paranoid entry from userspace

2014-11-12 Thread Andy Lutomirski
On Nov 12, 2014 2:30 AM, "Borislav Petkov" wrote: > > On Tue, Nov 11, 2014 at 06:06:48PM -0800, Tony Luck wrote: > > > Innocent bystanders have RIPV=1, EIPV=0 in MCG_STATUS ... so they > > > are quite easy to spot. > > > > Bother ... except for the SRAO cases where *everyone* is an innocent > > by

Re: [RFC PATCH] x86, entry: Switch stacks on a paranoid entry from userspace

2014-11-12 Thread Borislav Petkov
On Tue, Nov 11, 2014 at 06:06:48PM -0800, Tony Luck wrote: > > Innocent bystanders have RIPV=1, EIPV=0 in MCG_STATUS ... so they > > are quite easy to spot. > > Bother ... except for the SRAO cases where *everyone* is an innocent > bystander - but someone should go look for the error and queue up

Re: [RFC PATCH] x86, entry: Switch stacks on a paranoid entry from userspace

2014-11-11 Thread Tony Luck
> Innocent bystanders have RIPV=1, EIPV=0 in MCG_STATUS ... so they > are quite easy to spot. Bother ... except for the SRAO cases where *everyone* is an innocent bystander - but someone should go look for the error and queue up a page offline event. Perhaps for this we'd do the self-ipi trick an

Re: [RFC PATCH] x86, entry: Switch stacks on a paranoid entry from userspace

2014-11-11 Thread Andy Lutomirski
On Tue, Nov 11, 2014 at 5:06 PM, Luck, Tony wrote: >> I've thought about one sneaky option. If we can reliably determine >> that we're an innocent bystander of a broadcast #MC, can we send an >> IPI-to-self and return without clearing MCIP? Then we get another >> interrupt as soon as interrupts

RE: [RFC PATCH] x86, entry: Switch stacks on a paranoid entry from userspace

2014-11-11 Thread Luck, Tony
> I've thought about one sneaky option. If we can reliably determine > that we're an innocent bystander of a broadcast #MC, can we send an > IPI-to-self and return without clearing MCIP? Then we get another > interrupt as soon as interrupts are enabled, and we can clear MCIP at > a time when we'r

Re: [RFC PATCH] x86, entry: Switch stacks on a paranoid entry from userspace

2014-11-11 Thread Andy Lutomirski
On Tue, Nov 11, 2014 at 4:22 PM, Luck, Tony wrote: > Andy said: >> Yeah. But if you haven't cleared MCIP, you go boom, which is the same >> with pretty much any approach. > > The current code has an ugly hole at the moment. End of do_machine_check() > clears MCG_STATUS. At that point we are sti

RE: [RFC PATCH] x86, entry: Switch stacks on a paranoid entry from userspace

2014-11-11 Thread Luck, Tony
Andy said: > Yeah. But if you haven't cleared MCIP, you go boom, which is the same > with pretty much any approach. The current code has an ugly hole at the moment. End of do_machine_check() clears MCG_STATUS. At that point we are still running on the magic stack for machine check exceptions ..

Re: [RFC PATCH] x86, entry: Switch stacks on a paranoid entry from userspace

2014-11-11 Thread Andy Lutomirski
On Tue, Nov 11, 2014 at 3:09 PM, Borislav Petkov wrote: > On Tue, Nov 11, 2014 at 02:40:12PM -0800, Andy Lutomirski wrote: >> I wonder what the IRET is for. There had better not be another magic >> IRET unmask thing. I'm guessing that the actual semantics are that >> nothing whatsoever can mask

Re: [RFC PATCH] x86, entry: Switch stacks on a paranoid entry from userspace

2014-11-11 Thread Borislav Petkov
On Tue, Nov 11, 2014 at 02:40:12PM -0800, Andy Lutomirski wrote: > I wonder what the IRET is for. There had better not be another magic > IRET unmask thing. I'm guessing that the actual semantics are that > nothing whatsoever can mask #MC, but that a second #MC when MCIP is > still set is a shutd

Re: [RFC PATCH] x86, entry: Switch stacks on a paranoid entry from userspace

2014-11-11 Thread Andy Lutomirski
On Tue, Nov 11, 2014 at 2:33 PM, Borislav Petkov wrote: > On Tue, Nov 11, 2014 at 02:12:18PM -0800, Andy Lutomirski wrote: >> I don't see why it would be any more likely for the normal kernel >> stack to be corrupted due to a hardware issue that interrupted ring 3 >> code than that the IST stack i

Re: [RFC PATCH] x86, entry: Switch stacks on a paranoid entry from userspace

2014-11-11 Thread Borislav Petkov
On Tue, Nov 11, 2014 at 02:12:18PM -0800, Andy Lutomirski wrote: > I don't see why it would be any more likely for the normal kernel > stack to be corrupted due to a hardware issue that interrupted ring 3 > code than that the IST stack is corrupted. The IST stack is, well, used solely be used for

Re: [RFC PATCH] x86, entry: Switch stacks on a paranoid entry from userspace

2014-11-11 Thread Andy Lutomirski
On Tue, Nov 11, 2014 at 2:00 PM, Luck, Tony wrote: > So here is the flow: > > 1) A machine check happens - it is (currently) broadcast to all logical cpus > on all sockets > > 2) First cpu to execute "order = atomic_inc_return(&mce_callin);" in > mce_start() gets to be the "monarch" and directs

Re: [RFC PATCH] x86, entry: Switch stacks on a paranoid entry from userspace

2014-11-11 Thread Andy Lutomirski
On Tue, Nov 11, 2014 at 1:36 PM, Borislav Petkov wrote: > A very big hmmm... > > On Tue, Nov 11, 2014 at 12:56:52PM -0800, Andy Lutomirski wrote: >> This causes all non-NMI kernel entries from userspace to run on the >> normal kernel stack. > > So one of the reasons #MC has its own stack is becaus

RE: [RFC PATCH] x86, entry: Switch stacks on a paranoid entry from userspace

2014-11-11 Thread Luck, Tony
So here is the flow: 1) A machine check happens - it is (currently) broadcast to all logical cpus on all sockets 2) First cpu to execute "order = atomic_inc_return(&mce_callin);" in mce_start() gets to be the "monarch" and directs things during the handler. 3) Every cpu gets to scan all the ma

Re: [RFC PATCH] x86, entry: Switch stacks on a paranoid entry from userspace

2014-11-11 Thread Borislav Petkov
A very big hmmm... On Tue, Nov 11, 2014 at 12:56:52PM -0800, Andy Lutomirski wrote: > This causes all non-NMI kernel entries from userspace to run on the > normal kernel stack. So one of the reasons #MC has its own stack is because we need a known-good stack in such situations. What if the normal