Re: reliable reproducer, was Re: core dump analysis

2023-04-24 Thread Michael Schmitz
Hi Andreas, Am 22.04.2023 um 22:12 schrieb Andreas Schwab: On Apr 22 2023, Michael Schmitz wrote: This is the definition from the kernel's include/uapi/asm-generic/ucontext.h: That's not actually used by m68k, it uses arch/m68k/include/asm/ucontext.h, which confusingly isn't an uapi header.

Re: reliable reproducer, was Re: core dump analysis

2023-04-24 Thread Finn Thain
On Mon, 24 Apr 2023, Michael Schmitz wrote: > > I don't understand these results. If usp was really overwritten, the > > program would have crashed early, no? > > I think we're still at the point where rec() is called recursively, > before any returns. Right. I wasn't thinking. I'll try to co

Re: reliable reproducer, was Re: core dump analysis

2023-04-23 Thread Michael Schmitz
Hi Finn, Am 24.04.2023 um 15:51 schrieb Michael Schmitz: Hi Andreas, On 24/04/23 09:48, Andreas Schwab wrote: On Apr 24 2023, Michael Schmitz wrote: Not sure what third argument you referred to in another mail. See struct sigframe and struct rt_sigframe. The non-rt signal handler gets sign

Re: reliable reproducer, was Re: core dump analysis

2023-04-23 Thread Michael Schmitz
Hi Andreas, On 24/04/23 09:48, Andreas Schwab wrote: On Apr 24 2023, Michael Schmitz wrote: Not sure what third argument you referred to in another mail. See struct sigframe and struct rt_sigframe. The non-rt signal handler gets signal number, vector number and sigcontext*. The rt signal ha

Re: reliable reproducer, was Re: core dump analysis

2023-04-23 Thread Andreas Schwab
On Apr 24 2023, Michael Schmitz wrote: > Not sure what third argument you referred to in another mail. See struct sigframe and struct rt_sigframe. The non-rt signal handler gets signal number, vector number and sigcontext*. The rt signal handler gets signal number, siginfo* and ucontext*. --

Re: reliable reproducer, was Re: core dump analysis

2023-04-23 Thread Michael Schmitz
Hi Finn, On 23/04/23 21:23, Finn Thain wrote: On Sun, 23 Apr 2023, Michael Schmitz wrote: Am 23.04.2023 um 13:41 schrieb Michael Schmitz: Though the question remains - is this expected behaviour for programs that do deep recursion on the stack while taking signals (and the reason for the opti

Re: reliable reproducer, was Re: core dump analysis

2023-04-23 Thread Michael Schmitz
Hi Andreas, On 23/04/23 20:23, Andreas Schwab wrote: On Apr 23 2023, Michael Schmitz wrote: Wasn't too hard actually. The signo parameter passed to the handler turns out to be passed by reference, and signo is located 4 bytes into the kernel sigframe. That's not "passed by reference". Functi

Re: reliable reproducer, was Re: core dump analysis

2023-04-23 Thread Finn Thain
On Sun, 23 Apr 2023, Michael Schmitz wrote: > Am 23.04.2023 um 13:41 schrieb Michael Schmitz: > > Though the question remains - is this expected behaviour for programs > that do deep recursion on the stack while taking signals (and the reason > for the option to run signal handlers on an altern

Re: reliable reproducer, was Re: core dump analysis

2023-04-23 Thread Andreas Schwab
On Apr 23 2023, Michael Schmitz wrote: > Wasn't too hard actually. The signo parameter passed to the handler turns > out to be passed by reference, and signo is located 4 bytes into the > kernel sigframe. That's not "passed by reference". Function arguments are always passed on the stack. -- A

Re: reliable reproducer, was Re: core dump analysis

2023-04-23 Thread Michael Schmitz
Hi Finn, Andreas, Am 23.04.2023 um 13:41 schrieb Michael Schmitz: Hi Andreas, Am 23.04.2023 um 08:46 schrieb Andreas Schwab: On Apr 23 2023, Michael Schmitz wrote: I'll see whether the signal context is available on the stack even if the handler isn't passed that parameter. The signal cont

Re: reliable reproducer, was Re: core dump analysis

2023-04-23 Thread Andreas Schwab
On Apr 23 2023, Michael Schmitz wrote: > Hi Andreas, > > Am 23.04.2023 um 08:46 schrieb Andreas Schwab: >> On Apr 23 2023, Michael Schmitz wrote: >> >>> I'll see whether the signal context is available on the stack even if the >>> handler isn't passed that parameter. >> >> The signal context is al

Re: reliable reproducer, was Re: core dump analysis

2023-04-22 Thread Michael Schmitz
Hi Andreas, Am 23.04.2023 um 08:46 schrieb Andreas Schwab: On Apr 23 2023, Michael Schmitz wrote: I'll see whether the signal context is available on the stack even if the handler isn't passed that parameter. The signal context is always on the stack, and used by the (rt_)sigreturn syscall.

Re: reliable reproducer, was Re: core dump analysis

2023-04-22 Thread Andreas Schwab
On Apr 23 2023, Michael Schmitz wrote: > I'll see whether the signal context is available on the stack even if the > handler isn't passed that parameter. The signal context is always on the stack, and used by the (rt_)sigreturn syscall. -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerpri

Re: reliable reproducer, was Re: core dump analysis

2023-04-22 Thread Michael Schmitz
Hi Andreas, Am 23.04.2023 um 06:38 schrieb Andreas Schwab: On Apr 23 2023, Michael Schmitz wrote: Now I wonder who adds sigmask ... and whether that's also ending up on the user stack. The kernel only writes the first 64 bits of the signal mask, as it does for all signal mask related syscall

Re: reliable reproducer, was Re: core dump analysis

2023-04-22 Thread Andreas Schwab
On Apr 23 2023, Michael Schmitz wrote: > Now I wonder who adds sigmask ... and whether that's also ending up on the > user stack. The kernel only writes the first 64 bits of the signal mask, as it does for all signal mask related syscalls. The kernel version of the context ends after that; since

Re: reliable reproducer, was Re: core dump analysis

2023-04-22 Thread Michael Schmitz
Hi Andreas, Am 22.04.2023 um 22:12 schrieb Andreas Schwab: On Apr 22 2023, Michael Schmitz wrote: This is the definition from the kernel's include/uapi/asm-generic/ucontext.h: That's not actually used by m68k, it uses arch/m68k/include/asm/ucontext.h, which confusingly isn't an uapi header.

Re: reliable reproducer, was Re: core dump analysis

2023-04-22 Thread Andreas Schwab
On Apr 22 2023, Michael Schmitz wrote: > This is the definition from the kernel's > include/uapi/asm-generic/ucontext.h: That's not actually used by m68k, it uses arch/m68k/include/asm/ucontext.h, which confusingly isn't an uapi header. > And this is /usr/include/sys/ucontext.h: > > /* Userlevel

Re: reliable reproducer, was Re: core dump analysis

2023-04-22 Thread Michael Schmitz
Hi Finn, Am 22.04.2023 um 19:54 schrieb Michael Schmitz: Hi Finn, Am 21.04.2023 um 21:18 schrieb Michael Schmitz: Hi Finn, Am 21.04.2023 um 20:30 schrieb Finn Thain: On Fri, 21 Apr 2023, Michael Schmitz wrote: How often did a page fault happen when executing moveml, in other programs?

Re: reliable reproducer, was Re: core dump analysis

2023-04-22 Thread Andreas Schwab
On Apr 22 2023, Michael Schmitz wrote: > Took a little while to figure out that the ucontext format changed in the > decade or two since my userland's libc headers were generated. In which way did it change? -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510

Re: reliable reproducer, was Re: core dump analysis

2023-04-22 Thread Michael Schmitz
Hi Andreas, Am 22.04.2023 um 20:07 schrieb Andreas Schwab: On Apr 22 2023, Michael Schmitz wrote: Took a little while to figure out that the ucontext format changed in the decade or two since my userland's libc headers were generated. In which way did it change? This is the definition from

Re: reliable reproducer, was Re: core dump analysis

2023-04-22 Thread Michael Schmitz
Hi Finn, Am 21.04.2023 um 21:18 schrieb Michael Schmitz: Hi Finn, Am 21.04.2023 um 20:30 schrieb Finn Thain: On Fri, 21 Apr 2023, Michael Schmitz wrote: How often did a page fault happen when executing moveml, in other programs? The printk() I placed in bus_error030() was conditional on

Re: reliable reproducer, was Re: core dump analysis

2023-04-21 Thread Michael Schmitz
Hi Finn, Am 21.04.2023 um 20:30 schrieb Finn Thain: On Fri, 21 Apr 2023, Michael Schmitz wrote: How often did a page fault happen when executing moveml, in other programs? The printk() I placed in bus_error030() was conditional on the short word at the instruction pointer. It didn't consid

Re: reliable reproducer, was Re: core dump analysis

2023-04-21 Thread Finn Thain
On Fri, 21 Apr 2023, Michael Schmitz wrote: > > How often did a page fault happen when executing moveml, in other > programs? > The printk() I placed in bus_error030() was conditional on the short word at the instruction pointer. It didn't consider all forms of movem, just 0x48e7 which is th

Re: reliable reproducer, was Re: core dump analysis

2023-04-20 Thread Michael Schmitz
Hi Finn, Am 21.04.2023 um 09:58 schrieb Michael Schmitz: Hi Finn, On 20/04/23 20:55, Finn Thain wrote: But in any case, it looks like we can eliminate the bus error code. Same fault on both 030 and 040 with very different bus error handlers is highly unlikely. There's no failure on '040. Q

Re: reliable reproducer, was Re: core dump analysis

2023-04-20 Thread Michael Schmitz
Hi Finn, On 21/04/23 13:15, Finn Thain wrote: Fri, 21 Apr 2023 11:15:22 +1000 (AEST)n Thu, 20 Apr 2023, Michael Schmitz wrote: In my tests, increasing the depth does not cause a monotonous increase in fault probability. 16k depth only has four crashes, 8k had nine. I'll stick with 20 for n

Re: reliable reproducer, was Re: core dump analysis

2023-04-20 Thread Finn Thain
On Thu, 20 Apr 2023, Finn Thain wrote: > > I modified the test program to execute rec() to full depth with no > forking, then do it again with forking. > > root@(none):/root# while ./stack-test 5000 ; do : ; done > starting recursion > done. > starting recursion with fork > done. > starting rec

Re: reliable reproducer, was Re: core dump analysis

2023-04-20 Thread Michael Schmitz
Hi Finn, one more datapoint: using sigaltstack() to set up a separate signal stack in main() (code lifted straight from the sigaltstack man page) does avoid the stack corruption. That might be useful as a workaround for dash, while we're analyzing this bug. I wonder whether recursively gro

Re: reliable reproducer, was Re: core dump analysis

2023-04-20 Thread Finn Thain
Fri, 21 Apr 2023 11:15:22 +1000 (AEST)n Thu, 20 Apr 2023, Michael Schmitz wrote: > > In my tests, increasing the depth does not cause a monotonous increase > in fault probability. 16k depth only has four crashes, 8k had nine. I'll > stick with 20 for now. > My tests used 'norandmaps' in t

Re: reliable reproducer, was Re: core dump analysis

2023-04-20 Thread Michael Schmitz
Hi Finn, On 20/04/23 20:55, Finn Thain wrote: But in any case, it looks like we can eliminate the bus error code. Same fault on both 030 and 040 with very different bus error handlers is highly unlikely. There's no failure on '040. QEMU and Motorola '040 gave the same result. Sorry, my fau

Re: reliable reproducer, was Re: core dump analysis

2023-04-20 Thread Finn Thain
On Thu, 20 Apr 2023, Michael Schmitz wrote: > Am 20.04.2023 um 19:47 schrieb Finn Thain: > >>> So all the stack pages would have been faulted in well before the > >>> failure shows up. It appears to be the signal that's the problem and > >>> not the page fault. That's not surprising considering

Re: reliable reproducer, was Re: core dump analysis

2023-04-20 Thread Michael Schmitz
Hi Finn, Am 20.04.2023 um 19:47 schrieb Finn Thain: So all the stack pages would have been faulted in well before the failure shows up. It appears to be the signal that's the problem and not the page fault. That's not surprising considering the PC in the signal frame in the dash crash was a MOVE

Re: reliable reproducer, was Re: core dump analysis

2023-04-20 Thread Finn Thain
On Thu, 20 Apr 2023, Michael Schmitz wrote: > Am 20.04.2023 um 18:04 schrieb Finn Thain: > > On Wed, 19 Apr 2023, I wrote: > > > >> Oddly, the program never detects any stack corruption when run on the > >> QEMU '040. > >> > > > > I tested a Motorola '040 and got the same result. > > OK, that wou

Re: reliable reproducer, was Re: core dump analysis

2023-04-20 Thread Finn Thain
On Thu, 20 Apr 2023, Michael Schmitz wrote: > Am 20.04.2023 um 17:17 schrieb Finn Thain: > > On Thu, 20 Apr 2023, Michael Schmitz wrote: > > > >>> > >>> As with dash, the corruption lies the page boundary. > >> > >> Hence implies a page fault handled at the page boundary. > >> > >> Can you try and

Re: reliable reproducer, was Re: core dump analysis

2023-04-20 Thread Michael Schmitz
Hi Finn, Am 20.04.2023 um 18:04 schrieb Finn Thain: On Wed, 19 Apr 2023, I wrote: Oddly, the program never detects any stack corruption when run on the QEMU '040. I tested a Motorola '040 and got the same result. OK, that would mean the bus error was just the most reliable way to get do_

Re: reliable reproducer, was Re: core dump analysis

2023-04-20 Thread Michael Schmitz
Hi Finn, Am 20.04.2023 um 17:17 schrieb Finn Thain: On Thu, 20 Apr 2023, Michael Schmitz wrote: As with dash, the corruption lies the page boundary. Hence implies a page fault handled at the page boundary. Can you try and fault in as many of these stack pages as possible, ahead of filling

Re: reliable reproducer, was Re: core dump analysis

2023-04-19 Thread Finn Thain
On Wed, 19 Apr 2023, I wrote: > Oddly, the program never detects any stack corruption when run on the > QEMU '040. > I tested a Motorola '040 and got the same result.

Re: reliable reproducer, was Re: core dump analysis

2023-04-19 Thread Finn Thain
On Thu, 20 Apr 2023, I wrote: > > So it must be that a MOVEM went awry when a signal got delivered. Or signal delivery went awry after a MOVEM got resumed?

Re: reliable reproducer, was Re: core dump analysis

2023-04-19 Thread Finn Thain
On Thu, 20 Apr 2023, Michael Schmitz wrote: > > > > As with dash, the corruption lies the page boundary. > > Hence implies a page fault handled at the page boundary. > > Can you try and fault in as many of these stack pages as possible, ahead > of filling the stack? (Depending on how much RAM y

Re: reliable reproducer, was Re: core dump analysis

2023-04-19 Thread Michael Schmitz
Hi Finn, reproduced on my Falcon (with minor mods to the C source - my version of gcc didn't like asm with no clobbers, so I added "memory" as clobber in the second asm block). In this case it's a4 that is corrupted, but that varies. depth of 4096 gets me two core dumps on 20 attempts so thi

Re: reliable reproducer, was Re: core dump analysis

2023-04-19 Thread Finn Thain
On Thu, 20 Apr 2023, Michael Schmitz wrote: > Can you try and fault in as many of these stack pages as possible, ahead > of filling the stack? (Depending on how much RAM you have ...). Maybe we > would need to lock those pages into memory? Just to show that with no > page faults (but still sign

Re: reliable reproducer, was Re: core dump analysis

2023-04-19 Thread Finn Thain
On Wed, 19 Apr 2023, Geert Uytterhoeven wrote: > Does it also fail on a very old kernel image you still have lying > around? Just to rule out a recent kernel bug. > These are two mainline builds I've tried (among others): [0.00] Linux version 4.14.0-mac (fthain@nippy) (gcc version 6.

Re: reliable reproducer, was Re: core dump analysis

2023-04-19 Thread Michael Schmitz
Hi Finn, On 19/04/23 22:50, Finn Thain wrote: On Tue, 18 Apr 2023, Michael Schmitz wrote: ... I think what's stored there is the extra frame content for a format b bus error frame. But that extra frame is incomplete at best (should be 22 longwords, only a4 are seen). Probably overwritten by th

Re: reliable reproducer, was Re: core dump analysis

2023-04-19 Thread Geert Uytterhoeven
Hi Finn, On Wed, Apr 19, 2023 at 12:53 PM Finn Thain wrote: > Inspired by your observation about the page fault and stack growth, I > wrote a small test program (given below) that just pushes registers onto > the stack recursively while forking processes and collecting the SIGCHLD > signals. > >

reliable reproducer, was Re: core dump analysis

2023-04-19 Thread Finn Thain
On Tue, 18 Apr 2023, Michael Schmitz wrote: > > ... I think what's stored there is the extra frame content for a format > b bus error frame. But that extra frame is incomplete at best (should be > 22 longwords, only a4 are seen). Probably overwritten by the stack frame > from __GI___wait4_time