在 2022/10/21 PM12:41, Luck, Tony 写道:
>>> When we do return to user mode the task is going to be busy servicing
>>> a SIGBUS ... so shouldn't try to touch the poison page before the
>>> memory_failure() called by the worker thread cleans things up.
>>
>> What about an RT process on a busy system?
>> The worker threads are pretty low priority.
>
> Most tasks don't have a SIGBUS handler ... so they just die without
> possibility of accessing poison
>
> If this task DOES have a SIGBUS handler, and that for some bizarre reason
> just does a "return"
> so the task jumps back to the instruction that cause the COW then there is a
> 63/64
> likelihood that it is touching a different cache line from the poisoned one.
>
> In the 1/64 case ... its probably a simple store (since there was a COW, we
> know it was trying to
> modify the page) ... so won't generate another machine check (those only
> happen for reads).
>
> But maybe it is some RMW instruction ... then, if all the above options
> didn't happen ... we
> could get another machine check from the same address. But then we just
> follow the usual
> recovery path.
>
> -Tony
Let assume the instruction that cause the COW is in the 63/64 case, aka,
it is writing a different cache line from the poisoned one. But the new_page
allocated in COW is dropped right? So might page fault again?
Best Regards,
Shuai