On 1/30/23 8:05 PM, Michael Schmitz wrote: > ... > Am 30.01.2023 um 17:00 schrieb Stan Johnson: >> Hello, >> >> I am seeing anywhere from zero to four of the following errors while >> booting Linux on 68030 systems and using sysvinit startup scripts: >> >> *** stack smashing detected ***: terminated >> Aborted >> >> I usually (but not always) see three of the errors while init is running >> the rcS.d scripts, and one while running the rc2.d scripts. The stack >> smashing messages appear only on the system console (nothing is logged >> in an error log or dmesg). Despite the errors, the system continues >> booting to multiuser mode without any obvious additional problems. I >> haven't tested systemd, which is too slow to be useful on my m68k >> systems (though I have a Debian SID with systemd that I can restore for >> testing if necessary). >> >> ... > > Another way may be logging the start of each of the rcS.d or rc2.d > scripts until you know what scripts to look at in more detail, then > adding 'set -v' at the start of those to log every command in the > offending script.
Hi Michael, Thanks for your reply. After logging the start and end of each script, I see that the "stack smashing detected" error often happens while running "/etc/rcS.d/S01mountkernfs.sh" (/etc/init.d/mountkernfs.sh). I'll try to isolate it to a particular command. This may be a coincidence, but the error seems to happen (up to about 4 times) after a warm boot into Mac OS 7.5.5 and a subsequent boot into Linux that when starting with a cold boot into Mac OS 7.5.5, but it doesn't seem that that should make any difference for Linux. This morning, after a cold boot, I saw two of the errors, while after a warm boot, I saw four. > > Once the offending binary is known (and the crash can be reproduced > after system boot), gdb can be used to find the function that overwrote > its local stack guard. Is there a way to configure the kernel to use the stack guard for every function, and then log every resulting abort? I realize that that would be very slow, but it might also be useful for debugging. > > That's a lot of work on a 030 Mac - have you tried to reproduce this on > any kind of emulator? I haven't seen the error in QEMU. > > I suppose one difference between your 030 and 040 Macs might be the > amount of RAM available. I wonder if this bug results from a combination > of 030 MMU and memory pressure, or 030 MMU only. For some reason, the error seems to happen only with 68030 systems, regardless of processor speed or memory: PB 170 : 68030, 25 MHz, 8 MiB, external SCSI2SD Mac IIci : 68030, 25 MHz, 80 MiB, internal SCSI2SD SE/30 : 68030, 16 MHz, 128 MiB, external SCSI2SD PB 550c : 68040, 33 MHz, 36 MiB, external SCSI2SD Centris 650 : 68040, 25 MHz, 136 MiB, internal SCSI2SD -Stan