> Not that easy for testing the #MC path - there we have to inject real > MCEs and then noodle through the memory_failure() code. I'd be very much > interested to see what would happen if two MCEs happen back-to-back with > your change, the second one being raised when we're on the kernel stack > and in memory_failure()...
If the second one hits before we clear MCG_STATUS, then the processor resets. If the second one is caused by the recovery thread somewhere in memory_failure(), then Andy won't switch stacks - but we will declare this a fatal error an panic (we have no recovery from machine checks in the kernel). Otherwise the memory_failure() thread is the innocent bystander. If the affected thread decides to do recovery, then the first thread will be allowed to return and continue. I might worry a bit if the second error is another thread hitting the *same* page which hasn't finished processing yet ... then the second will chase along behind the first trying to fix the same problem. I *think* the first will complete and the second will just end up here: if (TestSetPageHWPoison(p)) { printk(KERN_ERR "MCE %#lx: already hardware poisoned\n", pfn); return 0; } which is really early in memory_failure(). -Tony N�����r��y����b�X��ǧv�^�){.n�+����{����zX����ܨ}���Ơz�&j:+v�������zZ+��+zf���h���~����i���z��w���?�����&�)ߢf��^jǫy�m��@A�a��� 0��h���i