> Not that easy for testing the #MC path - there we have to inject real
> MCEs and then noodle through the memory_failure() code. I'd be very much
> interested to see what would happen if two MCEs happen back-to-back with
> your change, the second one being raised when we're on the kernel stack
> and in memory_failure()...

If the second one hits before we clear MCG_STATUS, then the processor resets.

If the second one is caused by the recovery thread somewhere in 
memory_failure(),
then Andy won't switch stacks - but we will declare this a fatal error an panic 
(we have
no recovery from machine checks in the kernel).

Otherwise the memory_failure() thread is the innocent bystander. If the 
affected thread
decides to do recovery, then the first thread will be allowed to return and 
continue.

I might worry a bit if the second error is another thread hitting the *same* 
page which
hasn't finished processing yet ... then the second will chase along behind the 
first trying
to fix the same problem.  I *think* the first will complete and the second will 
just end
up here:

        if (TestSetPageHWPoison(p)) {
                printk(KERN_ERR "MCE %#lx: already hardware poisoned\n", pfn);
                return 0;
        }

which is really early in memory_failure().

-Tony
N�����r��y����b�X��ǧv�^�)޺{.n�+����{����zX����ܨ}���Ơz�&j:+v�������zZ+��+zf���h���~����i���z��w���?�����&�)ߢf��^jǫy�m��@A�a���
0��h���i

Reply via email to