> Your test case is presumably doing something that involves setting
> undocumented registers* to program the CPU or memory controller to
> generate a machine check on access to some address. Presumably this
> is done by broadcasting an SMI and programming the registers in SMM.
Good theory - but
On Tue, Nov 18, 2014 at 10:30 AM, Luck, Tony wrote:
>>> The lost cpu is *really* lost. Warm reset doesn't fix the machine, I
>>> usually
>>> have to do a full power cycle.
>
>> How is it even possible that I did that with a few lines of asm?
>
> Probably not your directly your fault - some casca
>> The lost cpu is *really* lost. Warm reset doesn't fix the machine, I usually
>> have to do a full power cycle.
> How is it even possible that I did that with a few lines of asm?
Probably not your directly your fault - some cascade of errors may have
occurred.
> Could this be a hardware bug?
On Mon, Nov 17, 2014 at 12:05:59PM -0800, Andy Lutomirski wrote:
> https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/log/?h=x86/paranoid
>
> I'm not quite ready to send v3. I want to do two things first:
>
> 1. Consider disabling the stack switch for double_fault.
Sounds conservativel
On Mon, Nov 17, 2014 at 4:22 PM, Luck, Tony wrote:
>> It could also be interesting to tweak mce_panic to not actually panic
>> the machine but to try to return and stop the test instead. Then real
>> debugging could be possible :)
>
> The lost cpu is *really* lost. Warm reset doesn't fix the mac
> It could also be interesting to tweak mce_panic to not actually panic
> the machine but to try to return and stop the test instead. Then real
> debugging could be possible :)
The lost cpu is *really* lost. Warm reset doesn't fix the machine, I usually
have to do a full power cycle.
-Tony
N���
On Mon, Nov 17, 2014 at 3:16 PM, Luck, Tony wrote:
>> I still wonder whether the timeout code is the real culprit. My patch
>> will slow down entry into do_machine_check by tens of cycles, several
>> cachelines, and possibly a couple of TLB misses. Given that the
>> timing seemed marginal to me,
> I still wonder whether the timeout code is the real culprit. My patch
> will slow down entry into do_machine_check by tens of cycles, several
> cachelines, and possibly a couple of TLB misses. Given that the
> timing seemed marginal to me, it's possible (albeit not that likely)
> that it pushed
On Mon, Nov 17, 2014 at 1:55 PM, Luck, Tony wrote:
>>> However, I'd like to be very sure this thing doesn't introduce any
>>> regressions to the MCA code. So even if Tony's testing passes, I'd like
>>> to be very conservative here and stress it more than usual. Because once
>>> this thing hits ups
>> However, I'd like to be very sure this thing doesn't introduce any
>> regressions to the MCA code. So even if Tony's testing passes, I'd like
>> to be very conservative here and stress it more than usual. Because once
>> this thing hits upstream and stuff starts breaking, it'll be a serious
>> P
On Mon, Nov 17, 2014 at 12:03 PM, Borislav Petkov wrote:
> On Mon, Nov 17, 2014 at 11:57:22AM -0800, Andy Lutomirski wrote:
>> Would it be worth making a decision on task_work_add vs. stack
>> switching first?
>
> Probably a prudent thing to do in order to save unnecessary cycles :-)
>
>> Stack sw
On Mon, Nov 17, 2014 at 11:57:22AM -0800, Andy Lutomirski wrote:
> Would it be worth making a decision on task_work_add vs. stack
> switching first?
Probably a prudent thing to do in order to save unnecessary cycles :-)
> Stack switching pros: all this lockless allocation stuff is completely
> un
On Mon, Nov 17, 2014 at 10:50 AM, Borislav Petkov wrote:
> On Fri, Nov 14, 2014 at 09:56:38PM +, Luck, Tony wrote:
>> ...
>> But I think that means we need more than one of these structures ...
>> we may not be done with one before a new machine check occurs. So
>> we'd have to make an NMI-saf
On Fri, Nov 14, 2014 at 09:56:38PM +, Luck, Tony wrote:
> ...
> But I think that means we need more than one of these structures ...
> we may not be done with one before a new machine check occurs. So
> we'd have to make an NMI-safe allocator to grab one for use inside
> do_machine_check()
Wel
On Fri, Nov 14, 2014 at 1:56 PM, Luck, Tony wrote:
>>> Right, I can do it in the meantime and we can always experiment more
>>> later. Getting rid of _TIF_MCE_NOTIFY is a good thing already.
>>
>> Yep, it looks pretty simple - not tested yet, it builds though.
>
> It seems pretty solid under test
>> Right, I can do it in the meantime and we can always experiment more
>> later. Getting rid of _TIF_MCE_NOTIFY is a good thing already.
>
> Yep, it looks pretty simple - not tested yet, it builds though.
It seems pretty solid under test so far.
Can we make it pass the address/flag to mce_notify
> So far, the only thing I've come up with is that do_machine_check
> seems to be missing exception_enter or the equivalent. Do you have
> CONFIG_CONTEXT_TRACKING on and/or full nohz enabled? I don't think
> that this explains my bug, though.
Yes to both:
$ grep CONTEXT_TRACK .config
CONFIG_CON
On Fri, Nov 14, 2014 at 9:49 AM, Luck, Tony wrote:
>> Can you also try rebasing onto what will probably be v3?
>>
>> https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/tag/?id=paranoid-stack-v2.9
>
> Built that - with none of my other changes ... i.e. still use TIF_NOTIFY_MCE
> etc. No p
On Fri, Nov 14, 2014 at 09:26:26AM -0800, Andy Lutomirski wrote:
> I was hoping for an actual worked-out example of what the parameters
> should be :)
Sorry, I haven't played with this myself either - haven't had a box with
EINJ yet. Maybe Tony has something.
--
Regards/Gruss,
Boris.
Sent f
>> It adds debugging for inappropriate reschedules from the wrong stack.
>> Setting CONFIG_DEBUG_ATOMIC_SLEEP might also be a good idea.
>
> Will add that for next build/test
Didn't see anything new. System died at 1108 recoveries with the
"Timeout synchronization ..." panic
-Tony
N�r��y
> Can you also try rebasing onto what will probably be v3?
>
> https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/tag/?id=paranoid-stack-v2.9
Built that - with none of my other changes ... i.e. still use TIF_NOTIFY_MCE
etc. No printk()
in the MCE context.
System ran 736 injection/consum
On Fri, Nov 14, 2014 at 9:24 AM, Borislav Petkov wrote:
> On Fri, Nov 14, 2014 at 09:18:51AM -0800, Andy Lutomirski wrote:
>> Grr. Do you or Tony have any pointers for how to test this myself? I
>> don't know enough about the acpi error injection thing, which I assume
>> is that Tony is using.
>
>
On Fri, Nov 14, 2014 at 09:18:51AM -0800, Andy Lutomirski wrote:
> Grr. Do you or Tony have any pointers for how to test this myself? I
> don't know enough about the acpi error injection thing, which I assume
> is that Tony is using.
Maybe that would help:
Documentation/acpi/apei/einj.txt
provid
On Nov 14, 2014 2:34 AM, "Borislav Petkov" wrote:
>
> On Wed, Nov 12, 2014 at 07:03:21PM -0800, Andy Lutomirski wrote:
> > printk seems to work just fine in do_machine_check.
>
> That must be pure luck. Has anything changed which I missed to make
> printk NMI-safe?
Heh. Probably not. Now I wond
On Wed, Nov 12, 2014 at 07:03:21PM -0800, Andy Lutomirski wrote:
> printk seems to work just fine in do_machine_check.
That must be pure luck. Has anything changed which I missed to make
printk NMI-safe?
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
--
On Thu, Nov 13, 2014 at 5:20 PM, Luck, Tony wrote:
> "worst ==
> MCE_AR_SEVERITY but regs->cs == 0 (i.e. in kernel)"
>
> This can't happen. We can only declare AR severity for a user mode fault.
I believe you, and I see that in the code, but the code is mightily twisted.
Anyway, my v3 will also
"worst ==
MCE_AR_SEVERITY but regs->cs == 0 (i.e. in kernel)"
This can't happen. We can only declare AR severity for a user mode fault.
Sent from my iPhone
> On Nov 13, 2014, at 16:50, Andy Lutomirski wrote:
>
> worst ==
> MCE_AR_SEVERITY but regs->cs == 0 (i.e. in kernel)
--
To unsubscribe fr
On Thu, Nov 13, 2014 at 3:13 PM, Andy Lutomirski wrote:
> On Thu, Nov 13, 2014 at 2:47 PM, Andy Lutomirski wrote:
>> On Thu, Nov 13, 2014 at 2:33 PM, Luck, Tony wrote:
Are you sure that this works in an unmodified kernel
>>>
>>> Unmodified kernel has run tens of thousands of
>>> injection/
On Thu, Nov 13, 2014 at 2:47 PM, Andy Lutomirski wrote:
> On Thu, Nov 13, 2014 at 2:33 PM, Luck, Tony wrote:
>>> Are you sure that this works in an unmodified kernel
>>
>> Unmodified kernel has run tens of thousands of
>> injection/consumption/recovery cycles.
>>
>> I did get a crash with the en
On Thu, Nov 13, 2014 at 2:33 PM, Luck, Tony wrote:
>> Are you sure that this works in an unmodified kernel
>
> Unmodified kernel has run tens of thousands of injection/consumption/recovery
> cycles.
>
> I did get a crash with the entry/exit traces you asked for. Last 2 lines
> of console lo
> Are you sure that this works in an unmodified kernel
Unmodified kernel has run tens of thousands of injection/consumption/recovery
cycles.
I did get a crash with the entry/exit traces you asked for. Last 2 lines
of console log
attached. There are a couple of OOPs before things fall apar
On Thu, Nov 13, 2014 at 2:23 PM, Andy Lutomirski wrote:
> On Thu, Nov 13, 2014 at 10:43 AM, Luck, Tony wrote:
>>> printk seems to work just fine in do_machine_check. Any chance you
>>> can instrument, for each cpu, all entries to do_machine_check, all
>>> calls to do_machine_check, all returns,
On Thu, Nov 13, 2014 at 10:43 AM, Luck, Tony wrote:
>> printk seems to work just fine in do_machine_check. Any chance you
>> can instrument, for each cpu, all entries to do_machine_check, all
>> calls to do_machine_check, all returns, and everything that tries to
>> do memory_failure?
>
> I first
On Thu, Nov 13, 2014 at 11:59:37AM +0100, Borislav Petkov wrote:
> I've been thinking about it recently too - adding MCA functionality to
> qemu/kvm could be very useful, especially the thresholding stuff, for
> testing RAS kernel code.
Btw, qemu monitor has a mce injection command with which I wa
> printk seems to work just fine in do_machine_check. Any chance you
> can instrument, for each cpu, all entries to do_machine_check, all
> calls to do_machine_check, all returns, and everything that tries to
> do memory_failure?
I first added a printk() just for the cpu that calls do_machine_che
On Wed, Nov 12, 2014 at 05:22:25PM +0100, Borislav Petkov wrote:
> > Less intrusive is certainly true.
>
> Right, I can do it in the meantime and we can always experiment more
> later. Getting rid of _TIF_MCE_NOTIFY is a good thing already.
Yep, it looks pretty simple - not tested yet, it builds
On Thu, Nov 13, 2014 at 12:31:30AM +, Luck, Tony wrote:
> > Is this something I can try under KVM?
>
> I don't know if KVM has a way to simulate a machine check event.
I've been thinking about it recently too - adding MCA functionality to
qemu/kvm could be very useful, especially the threshol
On Wed, Nov 12, 2014 at 4:31 PM, Luck, Tony wrote:
>> v2's not going to make a difference unless you're using uprobes at the
>> same time.
>
> Not (knowingly) using uprobes. System is installed with a RHEL7 userspace ...
> but is essentially
> idle except for my test program.
>
>> In the interest
On Wed, Nov 12, 2014 at 4:31 PM, Luck, Tony wrote:
>> v2's not going to make a difference unless you're using uprobes at the
>> same time.
>
> Not (knowingly) using uprobes. System is installed with a RHEL7 userspace ...
> but is essentially
> idle except for my test program.
>
>> In the interest
> v2's not going to make a difference unless you're using uprobes at the
> same time.
Not (knowingly) using uprobes. System is installed with a RHEL7 userspace ...
but is essentially
idle except for my test program.
> In the interest of my sanity, can you add something like
> BUG_ON(!user_mode_v
On Wed, Nov 12, 2014 at 3:41 PM, Luck, Tony wrote:
>> v2 coming soon with these changes and some additional comment cleanups.
>
v2's not going to make a difference unless you're using uprobes at the
same time.
> So v1 + do_machine_check change is not surviving some real testing. I'm
> injectin
> v2 coming soon with these changes and some additional comment cleanups.
So v1 + do_machine_check change is not surviving some real testing. I'm
injecting and
consuming errors sequentially with a small delay in between - so no fancy
corner cases with
multiple errors being processed ... we get
On Wed, Nov 12, 2014 at 2:00 PM, Oleg Nesterov wrote:
> Andy,
>
> As I said many times I do not understand asm ;) so most probably I missed
> something but let me ask anyway.
You must be the most competent non-asm-speaking asm reviewer in the world :)
>
> On 11/11, Andy Lutomirski wrote:
>>
>> -
Andy,
As I said many times I do not understand asm ;) so most probably I missed
something but let me ask anyway.
On 11/11, Andy Lutomirski wrote:
>
> --- a/arch/x86/kernel/entry_64.S
> +++ b/arch/x86/kernel/entry_64.S
> @@ -1064,6 +1064,9 @@ ENTRY(\sym)
> CFI_ADJUST_CFA_OFFSET ORIG_RAX-R15
On Wed, Nov 12, 2014 at 05:17:55PM +, Luck, Tony wrote:
> > Not that easy for testing the #MC path - there we have to inject real
> > MCEs and then noodle through the memory_failure() code. I'd be very much
> > interested to see what would happen if two MCEs happen back-to-back with
> > your ch
> Not that easy for testing the #MC path - there we have to inject real
> MCEs and then noodle through the memory_failure() code. I'd be very much
> interested to see what would happen if two MCEs happen back-to-back with
> your change, the second one being raised when we're on the kernel stack
> a
On Wed, Nov 12, 2014 at 07:48:15AM -0800, Andy Lutomirski wrote:
> I only switch stacks on entry from userspace, and the kernel stack is
> completely empty if that happens.
Ok, fair enough. There's still the argument that something might've
corrupted the kernel stack memory while the MCE_STACK is
On Nov 12, 2014 2:30 AM, "Borislav Petkov" wrote:
>
> On Tue, Nov 11, 2014 at 06:06:48PM -0800, Tony Luck wrote:
> > > Innocent bystanders have RIPV=1, EIPV=0 in MCG_STATUS ... so they
> > > are quite easy to spot.
> >
> > Bother ... except for the SRAO cases where *everyone* is an innocent
> > by
On Tue, Nov 11, 2014 at 06:06:48PM -0800, Tony Luck wrote:
> > Innocent bystanders have RIPV=1, EIPV=0 in MCG_STATUS ... so they
> > are quite easy to spot.
>
> Bother ... except for the SRAO cases where *everyone* is an innocent
> bystander - but someone should go look for the error and queue up
> Innocent bystanders have RIPV=1, EIPV=0 in MCG_STATUS ... so they
> are quite easy to spot.
Bother ... except for the SRAO cases where *everyone* is an innocent
bystander - but someone should go look for the error and queue up
a page offline event. Perhaps for this we'd do the self-ipi trick an
On Tue, Nov 11, 2014 at 5:06 PM, Luck, Tony wrote:
>> I've thought about one sneaky option. If we can reliably determine
>> that we're an innocent bystander of a broadcast #MC, can we send an
>> IPI-to-self and return without clearing MCIP? Then we get another
>> interrupt as soon as interrupts
> I've thought about one sneaky option. If we can reliably determine
> that we're an innocent bystander of a broadcast #MC, can we send an
> IPI-to-self and return without clearing MCIP? Then we get another
> interrupt as soon as interrupts are enabled, and we can clear MCIP at
> a time when we'r
On Tue, Nov 11, 2014 at 4:22 PM, Luck, Tony wrote:
> Andy said:
>> Yeah. But if you haven't cleared MCIP, you go boom, which is the same
>> with pretty much any approach.
>
> The current code has an ugly hole at the moment. End of do_machine_check()
> clears MCG_STATUS. At that point we are sti
Andy said:
> Yeah. But if you haven't cleared MCIP, you go boom, which is the same
> with pretty much any approach.
The current code has an ugly hole at the moment. End of do_machine_check()
clears MCG_STATUS. At that point we are still running on the magic stack for
machine check exceptions ..
On Tue, Nov 11, 2014 at 3:09 PM, Borislav Petkov wrote:
> On Tue, Nov 11, 2014 at 02:40:12PM -0800, Andy Lutomirski wrote:
>> I wonder what the IRET is for. There had better not be another magic
>> IRET unmask thing. I'm guessing that the actual semantics are that
>> nothing whatsoever can mask
On Tue, Nov 11, 2014 at 02:40:12PM -0800, Andy Lutomirski wrote:
> I wonder what the IRET is for. There had better not be another magic
> IRET unmask thing. I'm guessing that the actual semantics are that
> nothing whatsoever can mask #MC, but that a second #MC when MCIP is
> still set is a shutd
On Tue, Nov 11, 2014 at 2:33 PM, Borislav Petkov wrote:
> On Tue, Nov 11, 2014 at 02:12:18PM -0800, Andy Lutomirski wrote:
>> I don't see why it would be any more likely for the normal kernel
>> stack to be corrupted due to a hardware issue that interrupted ring 3
>> code than that the IST stack i
On Tue, Nov 11, 2014 at 02:12:18PM -0800, Andy Lutomirski wrote:
> I don't see why it would be any more likely for the normal kernel
> stack to be corrupted due to a hardware issue that interrupted ring 3
> code than that the IST stack is corrupted.
The IST stack is, well, used solely be used for
On Tue, Nov 11, 2014 at 2:00 PM, Luck, Tony wrote:
> So here is the flow:
>
> 1) A machine check happens - it is (currently) broadcast to all logical cpus
> on all sockets
>
> 2) First cpu to execute "order = atomic_inc_return(&mce_callin);" in
> mce_start() gets to be the "monarch" and directs
On Tue, Nov 11, 2014 at 1:36 PM, Borislav Petkov wrote:
> A very big hmmm...
>
> On Tue, Nov 11, 2014 at 12:56:52PM -0800, Andy Lutomirski wrote:
>> This causes all non-NMI kernel entries from userspace to run on the
>> normal kernel stack.
>
> So one of the reasons #MC has its own stack is becaus
So here is the flow:
1) A machine check happens - it is (currently) broadcast to all logical cpus on
all sockets
2) First cpu to execute "order = atomic_inc_return(&mce_callin);" in
mce_start() gets to be the "monarch" and directs things during the handler.
3) Every cpu gets to scan all the ma
A very big hmmm...
On Tue, Nov 11, 2014 at 12:56:52PM -0800, Andy Lutomirski wrote:
> This causes all non-NMI kernel entries from userspace to run on the
> normal kernel stack.
So one of the reasons #MC has its own stack is because we need a
known-good stack in such situations. What if the normal
62 matches
Mail list logo