Hi Finn,

Am 26.04.2023 um 14:02 schrieb Finn Thain:
On Wed, 26 Apr 2023, Michael Schmitz wrote:

Thanks - we had seen evidence that a bus error generated mid-instruction
did leave the USP at the address where the bus fault happened (not
before the instruction started, neither what it would have been once the
instruction completed), and the operation did not complete normally
after the bus error (at least the value/address seen in the exception
frame not stored).

I'm afraid I still don't fully understand how and why the user stack
(rather than the supervisor stack) gets used for processing the exception
frame.

The kernel stack would not be accessible to the signal handler which must run in process context (i.e. user space).

The exception frame is copied to the signal frame for informational purposes only (such as examination of processor state when the signal was taken - not too useful for SIGCHLD but could be used to interpret SIGSEGV).


Finn had also demonstrated that skipping signal delivery on bus errors
abolishes the stack corruption.  Your patch achieves the same objective
in a different way, so I'm sure this will work as well.

I had thought the 030 could resume the interrupted instruction using the
information from the exception frame - and that does appear to work in
all other cases except where signal delivery gets in the way, and it
also works if moving the exception frame a little bit further down the
stack. So our treatment of the bus error exception frame during signal
delivery appears to be incorrect.

It seems I got confused about user and kernel stack there myself. And managed to confuse almost everyone else about this bug. Apologies for the incessant noise.

What matters for the return from exception is an intact frame on the kernel stack. Anything we do on the user stack (mucking around with the offset the sigframe is set up at, copying siginfo, ucontext or sigcontext plus exception frame extra) does not change the kernel stack one whit.

The mangle_kernel_stack stuff is needed because sys_sigreturn will place another exception frame on the kernel stack (a four word frame) that needs to be replaced by the bus error exception frame (or any other frame that caused the kernel mode entry prior to signal delivery) before finally returning from the bus error exception.

Only at that time will the movel instruction that took the bus fault resume (and complete its writes correctly, I hope).

Our problem may be that, if we take the signal too late and our main process inspects the stack that has been left partially saved only (due to the bus error processing still in-flight), we appear to be in trouble. After completing sys_sigreturn, everything will be OK.

I can see this cause the stack error in the test case. Not sure it also applies to the dash case ...

Wouldn't that depend on the exception frame format? Perhaps it is unsafe
to treat any format 0xB exception frame in the way we do. If so, what do
we do about address error exceptions, which are to produce SIGBUS? The
Programmers Reference Manual says "a long bus fault stack frame may be
generated" in this case.

We don't handle access errors (beyond terminating the offending process).

I hope this makes a little more sense now...

Cheers,

        Michael

Reply via email to