Re: Debian initramfs/initrd, was Re: stack smashing detected

2023-02-08 Thread Stan Johnson
On 2/7/23 3:14 PM, Brad Boyer wrote:
> On Tue, Feb 07, 2023 at 12:41:52PM -0700, Stan Johnson wrote:
>> Yes, I do have the L2 cache card installed in the IIci.
>>
>> If you think it would be useful, I don't mind letting the SE/30 run
>> overnight to see if it eventually boots.
> 
> I don't know if it would be useful in any practical sense, but I do
> admit to being curious how many hours it will take. Overnight should
> be enough. That would confirm that it's just extremely slow.
> 

I let the SE/30 run for 15 hours, and it appears to have hung.



Re: Debian initramfs/initrd, was Re: stack smashing detected

2023-02-08 Thread Stan Johnson
On 2/7/23 4:20 PM, Finn Thain wrote:
> 
> On Tue, 7 Feb 2023, Stan Johnson wrote:
> ...
> Preventing pointless key generation would be beneficial for all Macs, 
> Amigas, Ataris, emulators etc. and measuring the performance of one model 
> of Mac versus that of another model seems a bit irrelevant to me.
> 

Sure, but unless Debian unsupported is willing to manage config files
for the various systems, then it's not likely to happen. I currently use
separate config files for the following Macs, to build kernels with no
initrd, no modules, and only minimal network and video support:

1) 68030 8 MiB, no network (PB-170)
2) 68030 >8 MiB (SE/30, IIci, IIfx, Centris LC III, etc.)
3) 68040 (Centris 650, PB 550c, etc.)

It doesn't seem unusual that users should maintain their own config
files, since there's no way Debian unsupported could be expected to know
which options may apply to individual users' systems.

> Moreover, you've shown that your kernel builds produce stack smashing 
> errors whereas Debian's build does not. To resolve the problem with your 
> builds, why not begin by eliminating some of the differences between your 
> build and Debian's?

The stack smashing appears to be intermittent. And it doesn't show up
while booting the kernel; it only shows up while sysvinit scripts are
running (I haven't tested using systemd, since that would be too painful
on any 68030 slower than about 40 MHz). It takes too long to boot slow
systems using Debian's kernel to run repeated tests, and QEMU only
emulates 68040, so it appears to be necessary to test on real hardware.
It's not my goal to get my config files closer to Debian's, anyway. My
goal is to have the fewest number of config files for different groups
of systems.

If anyone knows of a 68030 emulator (maybe Basilisk?) that can boot
Linux, then I might be able to use that for faster testing.

> I suggest you should adopt the current Debian SID build environment and 
> toolchain use it to build mainline Linux (stock v6.1) using QEMU.

I use QEMU to install Debian SID. The latest versions of Debian SID,
Debian kernel, initrd and modules all work great in QEMU. I then copy
the Debian SID rootfs to various 680x0 systems, but I don't use the
Debian kernel, initrd or modules.

> If you use your .config and if you still get stack smashing errors then 
> you can use the script I wrote to bisect the differences between your 
> .config and Debian's .config.
> 

If the stack smashing is caused by a kernel bug that is hidden by
Debian's choice of config options, then it would still be useful to
identify the bug. If there is something missing from my config files
that is causing the problem, then that would still be a kernel bug in
its sanity checking of options. Your script will be helpful if it
becomes necessary to identify specific offending options.

-Stan



Re: Debian initramfs/initrd, was Re: stack smashing detected

2023-02-08 Thread John Klos

If anyone knows of a 68030 emulator (maybe Basilisk?) that can boot
Linux, then I might be able to use that for faster testing.


I've played around with NetBSD on FS-UAE. I'd use it more, except for the 
fact that the emulation of the Commodore 2065 ethernet card gives very 
flakey networking.


The emulation was configured with an m68030 & m68882, and I've heard it 
can run Linux, too.



This difference alone could probably double or triple the performance
even without the clock speed change.


Even though the external cache can do burst transfers to the CPU's cache, 
this definitely wouldn't double or triple performance - most of the time 
the CPU isn't waiting for data from memory. It'd be faster, but not 
significantly.


John



Re: Debian initramfs/initrd, was Re: stack smashing detected

2023-02-08 Thread Eero Tamminen

Hi,

On 8.2.2023 19.39, Stan Johnson wrote:

The stack smashing appears to be intermittent. And it doesn't show up
while booting the kernel; it only shows up while sysvinit scripts are
running (I haven't tested using systemd, since that would be too painful
on any 68030 slower than about 40 MHz). It takes too long to boot slow
systems using Debian's kernel to run repeated tests, and QEMU only
emulates 68040, so it appears to be necessary to test on real hardware.
It's not my goal to get my config files closer to Debian's, anyway. My
goal is to have the fewest number of config files for different groups
of systems.

If anyone knows of a 68030 emulator (maybe Basilisk?) that can boot
Linux, then I might be able to use that for faster testing.


On quick search I was not able to find out whether Basilisk II (or mini 
vMac) emulate both 030 MMU and CPU i/d-cache.  If they don't, they may 
not be accurate enough to find issues like this.



In case the issue is only 030, not Mac specific, both WinUAE (Amiga) and 
Hatari (Atari) emulators support 68030 + MMU + cache emulation, and can 
boot Linux.


Hatari CPU core is based on one from WinUAE.)


Apparently WinUAE has not had problems running Linux, but in Hatari, 
enabling 030 _cache_ emulation breaks Linux boot when it reaches user 
space (kernel boot works fine).


I'm not sure whether latter is related to the issue you are seeing on Mac.

(There are some differences e.g. in how Amiga and Atari handle CPU 
exceptions and how MMU is used, which may explain differences in 
behavior.  I have no idea whether Mac is closer to Amiga or Atari in 
this respect.)



For details on using m68k Linux with Hatari, see:
https://hatari.tuxfamily.org/doc/m68k-linux.txt


- Eero

PS. Debugger in Hatari emulator can provide you with backtraces for 
Linux kernel side, and profile where kernel time goes.


For a full boot, callgraphs will be so large that you'll probably need 
to throw away 99% fo the data though, to make them readable.  For 
details, see:

https://hatari.tuxfamily.org/doc/debugger.html#Profiling



Re: Debian initramfs/initrd, was Re: stack smashing detected

2023-02-08 Thread Finn Thain
On Wed, 8 Feb 2023, Stan Johnson wrote:

> On 2/7/23 4:20 PM, Finn Thain wrote:
> > 
> > On Tue, 7 Feb 2023, Stan Johnson wrote:
> > ...
> > Preventing pointless key generation would be beneficial for all Macs, 
> > Amigas, Ataris, emulators etc. and measuring the performance of one model 
> > of Mac versus that of another model seems a bit irrelevant to me.
> > 
> 
> Sure, but unless Debian unsupported is willing to manage config files
> for the various systems, then it's not likely to happen. 

It's easy to refute that. Just read my message from 2 days ago in this 
very thread where I pointed to a different Debian kernel key generation 
issue that got fixed.

> 
> > Moreover, you've shown that your kernel builds produce stack smashing 
> > errors whereas Debian's build does not. To resolve the problem with your 
> > builds, why not begin by eliminating some of the differences between your 
> > build and Debian's?
> 
> The stack smashing appears to be intermittent. And it doesn't show up
> while booting the kernel; it only shows up while sysvinit scripts are
> running (I haven't tested using systemd, since that would be too painful
> on any 68030 slower than about 40 MHz). 

No-one is asking for systemd tests.

> It takes too long to boot slow systems using Debian's kernel to run 
> repeated tests ...

If your m68k machines are too slow, why do you care about stack smashing 
errors at all?

> 
> If the stack smashing is caused by a kernel bug that is hidden by 
> Debian's choice of config options, then it would still be useful to 
> identify the bug. If there is something missing from my config files 
> that is causing the problem, then that would still be a kernel bug in 
> its sanity checking of options.

This is not about sanity checking.

Anyway, if you follow the steps I gave, we all get to learn something 
about the cause of the stack smashing error -- if that's what you want.



Re: stack smashing detected

2023-02-08 Thread Michael Schmitz

Hi Stan,

Am 08.02.2023 um 11:58 schrieb Michael Schmitz:

Thanks Stan,

On 8/02/23 08:37, Stan Johnson wrote:

Hi Michael,

On 2/5/23 3:19 PM, Michael Schmitz wrote:

...

Seeing Finn's report that Al Viro's VM_FAULT_RETRY fix may have solved
his task corruption troubles on 040, I just noticed that I probably
misunderstood how Al's patch works.

Botching up a fault retry and carrying on may well leave the page tables
in a state where some later access could go to the wrong page and
manifest as user space corruption. Could you try Al's patch 4 (m68k: fix
livelock in uaccess) to see if this helps?
...

ok, this appears to be the patch:

Signed-off-by: Al Viro 
---
  arch/m68k/mm/fault.c | 5 -
  1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/m68k/mm/fault.c b/arch/m68k/mm/fault.c
index 4d2837eb3e2a..228128e45c67 100644
--- a/arch/m68k/mm/fault.c
+++ b/arch/m68k/mm/fault.c
@@ -138,8 +138,11 @@ int do_page_fault(struct pt_regs *regs, unsigned
long address,
  fault = handle_mm_fault(vma, address, flags, regs);
  pr_debug("handle_mm_fault returns %x\n", fault);

-if (fault_signal_pending(fault, regs))
+if (fault_signal_pending(fault, regs)) {
+if (!user_mode(regs))
+goto no_context;
  return 0;
+}

  /* The fault is fully completed (including releasing mmap lock) */
  if (fault & VM_FAULT_COMPLETED)


That's correct.

Your results show improvement but the problem does not entirely go away.

Looking at differences between 030 and 040/040 fault handling, it
appears only 030 handles faults corrected by exception tables (such as
used in uaccess macros) special, i.e. aborting bus error processing
while 040 and 060 carry on in the fault handler.

I wonder if that's the main difference between 030 and 040 behaviour?


Following the 040 code a bit further, I suspect that happens in the 040 
writeback handler, so this may be a red herring.



I'll try and log such accesses caught by exception tables on 030 to see
if they are rare enough to allow adding a kernel log message...


Looks like this kind of event is rare enough to not trigger in a normal 
boot on my 030. Please give the attached patch a try so we can confirm 
(or rule out) that user space access faults from kernel mode are to 
blame for your stack smashes.


Cheers,

Michael



Cheers,

Michael


>From a55467a02b66addca6f74fc32b473bc077cb34b2 Mon Sep 17 00:00:00 2001
From: Michael Schmitz 
Date: Thu, 9 Feb 2023 14:39:35 +1300
Subject: [PATCH] m68k: debug exception handling data faults on 030

030 faults handled by exception tables are just silently ignored - see how
many of these do happen in practice, and if they are related to 'stack
smashing' faults.

Signed-off-by: Michael Schmitz 
---
 arch/m68k/kernel/traps.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/m68k/kernel/traps.c b/arch/m68k/kernel/traps.c
index 5c8cba0efc63..b3cef760f7e8 100644
--- a/arch/m68k/kernel/traps.c
+++ b/arch/m68k/kernel/traps.c
@@ -554,8 +554,13 @@ static inline void bus_error030 (struct frame *fp)
 			}
 			/* Don't try to do anything further if an exception was
 			   handled. */
-			if (do_page_fault (&fp->ptregs, addr, errorcode) < 0)
+			if (do_page_fault (&fp->ptregs, addr, errorcode) < 0) {
+pr_err("Exception handled for data %s fault at %#010lx in %s (pc=%#lx)\n",
+   ssw & RW ? "read" : "write",
+   fp->un.fmtb.daddr,
+   space_names[ssw & DFC], fp->ptregs.pc);
 return;
+			}
 		} else if (!(mmusr & MMU_I)) {
 			/* probably a 020 cas fault */
 			if (!(ssw & RM) && send_fault_sig(&fp->ptregs) > 0)
-- 
2.17.1