On Sun, Nov 17, 2024 at 04:16:17PM +1000, David Gwynne wrote: > On Sat, Nov 16, 2024 at 07:36:37PM -0800, Andrew Hewus Fresh wrote: > > I finally got around to fixing my alpha, which involved replacing the > > disk. That meant I have to scp some stuff over to to it and after a bit > > of time it panics: > > > > panic: mtx 0xfffffe000002a628: locking against myself > > Stopped at db_enter+0x8: lda sp,10(sp) > > TID PID UID PRFLAGS PFLAGS CPU COMMAND > > *204699 79774 0 0x14000 0x200 0 softnet0 > > db_enter(0, 7ffffe00e0003f8, 1, 8, 3, 8) at db_enter+0x8 > > panic(?, fffffe000002a628, 1b0, 10, a, 1) at panic+0xe8 > > mtx_enter(?, ?, 1b0, 10, a, 1) at mtx_enter+0xb4 > > ifq_set_oactive(?, ?, 1b0, 10, a, 1) at ifq_set_oactive+0x50 > > this is from src/sys/net/ifq.c r1.50 where i added a counter for the > number of times oactive gets set. because there's checks and multiple > things being tweaked i used the ifq mutex to serialise the updates. > > de(4) uses ifq_deq_begin to try and shove an mbuf onto the hardware, > which takes but doesnt release the ifq mutex until ifq_deq_commit or > ifq_deq_rollback is called. so while it's holding the mutex is calls > ifq_set_oactive, which also tries to take the mutex. > > i honestly don't understand what de(4) is doing with the hardware and > packet setup, so i dont feel confident changing the driver to avoid > this. the least worst alternative i could think of is to provide an > alternative set_oactive it can call. > > the diff below should fix this.
Unfortunately, I'm not sure how to test it :-) Trying to check out a src tree via cvs over ssh trips the panic and the CD drive in the machine has been giving me trouble. It's old enough that it doesn't have USB and I don't have a PCI USB card to add it (although asking around). Hmm . . . I was able to do an install over https. I'll see if I can use bsd.rd to get bits onto the machine. > > I reinstalled back to 7.6 from the November 13 snapshot I tried first > > with no change. I can apparently reproduce at will, but the machine is > > pretty slow so diagnostics will be slow. Both dmesg are below. > > > > Is this something known or should I try gather more details? > > (if so, anything in particular?) > > > > > > [ using 1157232 bytes of bsd ELF symbol table ] > > Copyright (c) 1982, 1986, 1989, 1991, 1993 > > The Regents of the University of California. All rights reserved. > > Copyright (c) 1995-2024 OpenBSD. All rights reserved. > > https://www.OpenBSD.org > > > > OpenBSD 7.6-current (GENERIC) #460: Wed Nov 13 17:53:07 MST 2024 > > dera...@alpha.openbsd.org:/usr/src/sys/arch/alpha/compile/GENERIC > > AlphaStation 200 4/166, 166MHz > > 8192 byte page size, 1 processor. > > real mem = 167772160 (160MB) > > rsvd mem = 2048000 (1MB) > > avail mem = 152584192 (145MB) > > random: good seed from bootblocks > > mainbus0 at root > > cpu0 at mainbus0: ID 0 (primary), 21064-0 (pass 2 or 2.1) > > apecs0 at mainbus0: DECchip 21071 Core Logic chipset > > apecs0: DC21071-CA pass 2, 64-bit memory bus > > apecs0: DC21071-DA pass 2 > > pci0 at apecs0 bus 0 > > siop0 at pci0 dev 6 function 0 "Symbios Logic 53c810" rev 0x02: isa irq 11 > > scsibus0 at siop0: 8 targets, initiator 7 > > sd0 at scsibus0 targ 0 lun 0: <SEAGATE, ST336753LC, 0006> > > serial.SEAGATE_ST336753LC_3HX1QFE000007422AWD1 > > sd0: 35003MB, 512 bytes/sector, 71687372 sectors > > sio0 at pci0 dev 7 function 0 "Intel 82378IB ISA" rev 0x03 > > de0 at pci0 dev 11 function 0 "DEC 21040" rev 0x23, DEC 21040 pass 2.3: isa > > irq 5, address 08:00:2b:e4:f4:33 > > tga0 at pci0 dev 13 function 0 "DEC 21030" rev 0x02: DC21030 step B, board > > type T8-02 > > tga0: 1024 x 768, 8bpp, Bt485 RAMDAC > > tga0: interrupting at isa irq 10 > > wsdisplay0 at tga0 mux 1 > > wsdisplay0: screen 0 added (std, vt100 emulation) > > isa0 at sio0 > > isadma0 at isa0 > > fdc0 at isa0 port 0x3f0/6 irq 6 drq 2 > > com0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo > > com0: console > > com1 at isa0 port 0x2f8/8 irq 3: ns16550a, 16 byte fifo > > pckbc0 at isa0 port 0x60/5 irq 1 irq 12 > > pcppi0 at isa0 port 0x61 > > spkr0 at pcppi0 > > lpt0 at isa0 port 0x3bc/4 irq 7 > > mcclock0 at isa0 port 0x70/2: mc146818 or compatible > > stray isa irq 3 > > vscsi0 at root > > scsibus1 at vscsi0: 256 targets > > softraid0 at root > > scsibus2 at softraid0: 256 targets > > siop0: target 0 now using tagged 8 bit 10.0 MHz 8 REQ/ACK offset xfers > > root on sd0a (0c5ed4a2cd41ff27.a) swap on sd0b dump on sd0b > > fd0 at fdc0 drive 0: 1.44MB 80 cyl, 2 head, 18 sec > > stray isa irq 3 > > panic: mtx 0xfffffe000002a628: locking against myself > > Stopped at db_enter+0x8: lda sp,10(sp) > > TID PID UID PRFLAGS PFLAGS CPU COMMAND > > > > *434339 44111 0 0x14000 0x200 0 softnet0 > > > > db_enter(0, 7ffffe00e0003f8, 1, 8, 3, 8) at db_enter+0x8 > > panic(?, fffffe000002a628, 1c0, 10, a, 1) at panic+0xe8 > > mtx_enter(?, ?, 1c0, 10, a, 1) at mtx_enter+0xb4 > > ifq_set_oactive(?, ?, 1c0, 10, a, 1) at ifq_set_oactive+0x50 > > https://www.openbsd.org/ddb.html describes the minimum info required in bug > > reports. Insufficient info makes it difficult to find and fix bugs. > > ddb> ddb> ddb> ddb> *cpu0: mtx 0xfffffe000002a628: locking against myself > > ddb> syncing disks...3 2 done > > WARNING: not updating battery clock > > rebooting... > > Index: net/ifq.c > =================================================================== > RCS file: /cvs/src/sys/net/ifq.c,v > diff -u -p -r1.54 ifq.c > --- net/ifq.c 9 Nov 2024 04:09:56 -0000 1.54 > +++ net/ifq.c 17 Nov 2024 06:04:59 -0000 > @@ -156,6 +156,17 @@ ifq_set_oactive(struct ifqueue *ifq) > } > > void > +ifq_deq_set_oactive(struct ifqueue *ifq) > +{ > + MUTEX_ASSERT_LOCKED(&ifq->ifq_mtx); > + > + if (!ifq->ifq_oactive) { > + ifq->ifq_oactive = 1; > + ifq->ifq_oactives++; > + } > +} > + > +void > ifq_restart_task(void *p) > { > struct ifqueue *ifq = p; > Index: net/ifq.h > =================================================================== > RCS file: /cvs/src/sys/net/ifq.h,v > diff -u -p -r1.41 ifq.h > --- net/ifq.h 10 Nov 2023 15:51:24 -0000 1.41 > +++ net/ifq.h 17 Nov 2024 06:04:59 -0000 > @@ -444,6 +444,7 @@ void ifq_q_leave(struct ifqueue *, voi > void ifq_serialize(struct ifqueue *, struct task *); > void ifq_barrier(struct ifqueue *); > void ifq_set_oactive(struct ifqueue *); > +void ifq_deq_set_oactive(struct ifqueue *); > > int ifq_deq_sleep(struct ifqueue *, struct mbuf **, int, int, > const char *, volatile unsigned int *, > Index: dev/pci/if_de.c > =================================================================== > RCS file: /cvs/src/sys/dev/pci/if_de.c,v > diff -u -p -r1.143 if_de.c > --- dev/pci/if_de.c 24 May 2024 06:02:53 -0000 1.143 > +++ dev/pci/if_de.c 17 Nov 2024 06:04:59 -0000 > @@ -3897,7 +3897,7 @@ tulip_txput(tulip_softc_t * const sc, st > > if (sc->tulip_flags & TULIP_TXPROBE_ACTIVE) { > TULIP_CSR_WRITE(sc, csr_txpoll, 1); > - ifq_set_oactive(&sc->tulip_if.if_snd); > + ifq_deq_set_oactive(&sc->tulip_if.if_snd); > TULIP_PERFEND(txput); > return (NULL); > } > @@ -3926,7 +3926,7 @@ tulip_txput(tulip_softc_t * const sc, st > sc->tulip_dbg.dbg_txput_finishes[6]++; > #endif > if (sc->tulip_flags & (TULIP_WANTTXSTART|TULIP_DOINGSETUP)) { > - ifq_set_oactive(&sc->tulip_if.if_snd); > + ifq_deq_set_oactive(&sc->tulip_if.if_snd); > if ((sc->tulip_intrmask & TULIP_STS_TXINTR) == 0) { > sc->tulip_intrmask |= TULIP_STS_TXINTR; > TULIP_CSR_WRITE(sc, csr_intr, sc->tulip_intrmask); > -- andrew Full-time system administration is a delicate balance between proactiveness and laziness. -- jhorwitz from use.perl.org