On Mon, Nov 17, 2014 at 4:22 PM, Luck, Tony <tony.l...@intel.com> wrote:
>> It could also be interesting to tweak mce_panic to not actually panic
>> the machine but to try to return and stop the test instead.  Then real
>> debugging could be possible :)
>
> The lost cpu is *really* lost.  Warm reset doesn't fix the machine, I usually
> have to do a full power cycle.

How is it even possible that I did that with a few lines of asm?

Could this be a hardware bug?  Is there some condition that causes #MC
delivery to wedge hard enough that even INIT/RESET stops working?  Or
possibly some CPU got stuck in SMM -- I have no idea what warm reset
does these days.

My initial attempts to test machine_check in KVM using IPIs are having
some issues, probably because I'm not acking the interrupt.  I can do
it once, but then it stops working.

Here's the patch to improve the timeout messages, but given the degree
of wedgedness, I can guess what it'll say:

https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/commit/?h=x86/paranoid&id=e5cbd9d141bde651ecb20f0b65ad13bcef2468d0

--Andy

>
> -Tony



-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to