On Thu, Feb 15, 2024 at 04:38:07PM +0100, Claudio Jeker wrote: > On Thu, Feb 15, 2024 at 03:30:39PM +0000, Miod Vallat wrote: > > > > A lot of this points towards a register window error in__tfork() which > > > > affects only the parent process. > > > > > > I think the issue is that cpu_fork() copies the u_pcb from parent to child > > > and with that the special user register windows that got spilled because > > > of some TL>0 fault. Since __tfork() uses a new stack it makes no sense > > > to keep those windows around and possibly fault them back from the new > > > thread (onto the old stack). > > > > That makes sense. > > > > > The following diff seems to fix the above repoducer for me. Please test. > > > I will need to add some comments etc there (also not sure if the bzero() > > > calls are needed or if clearing pcb_nsaved is enough. > > > > Actually, rather than bzero'ing pcb_rwsp and pcb_rw, it would be better > > to only bcopy offsetof(struct pcb, pcb_rw) rather than sizeof(struct > > pcb) on line 139, i.e. something like this: > > > > Index: vm_machdep.c > > =================================================================== > > RCS file: /OpenBSD/src/sys/arch/sparc64/sparc64/vm_machdep.c,v > > retrieving revision 1.42 > > diff -u -p -r1.42 vm_machdep.c > > --- vm_machdep.c 25 Oct 2022 06:05:57 -0000 1.42 > > +++ vm_machdep.c 15 Feb 2024 15:29:37 -0000 > > @@ -100,6 +100,7 @@ cpu_fork(struct proc *p1, struct proc *p > > struct pcb *npcb = &p2->p_addr->u_pcb; > > struct trapframe *tf2; > > struct rwindow *rp; > > + size_t pcbsz; > > extern struct proc proc0; > > > > /* > > @@ -136,7 +137,15 @@ cpu_fork(struct proc *p1, struct proc *p > > #else > > opcb->lastcall = NULL; > > #endif > > - bcopy((caddr_t)opcb, (caddr_t)npcb, sizeof(struct pcb)); > > + /* > > + * If a new stack is provided, do not bother copying saved windows > > + * in the new pcb. Also, we'll reset pcb_nsaved accordingly below. > > + */ > > + if (stack != NULL) > > + pcbsz = offsetof(struct pcb, pcb_rw); > > + else > > + pcbsz = sizeof(struct pcb); > > + bcopy((caddr_t)opcb, (caddr_t)npcb, pcbsz); > > if (p1->p_md.md_fpstate) { > > fpusave_proc(p1, 1); > > p2->p_md.md_fpstate = malloc(sizeof(struct fpstate), > > @@ -162,6 +171,7 @@ cpu_fork(struct proc *p1, struct proc *p > > * with space reserved for the frame, and zero the frame pointer. > > */ > > if (stack != NULL) { > > + npcb->pcb_nsaved = 0; > > tf2->tf_out[6] = (u_int64_t)(u_long)stack - (BIAS + CC64FSZ); > > tf2->tf_in[6] = 0; > > } > > Looks good to me. Will throw that onto my testbox and see.
Has been running for the last few hours without any issue. OK claudio@ on that diff. -- :wq Claudio