On Sun, Nov 17, 2024 at 04:16:17PM +1000, David Gwynne wrote:
> On Sat, Nov 16, 2024 at 07:36:37PM -0800, Andrew Hewus Fresh wrote:
> > I finally got around to fixing my alpha, which involved replacing the
> > disk.  That meant I have to scp some stuff over to to it and after a bit
> > of time it panics:
> > 
> > panic: mtx 0xfffffe000002a628: locking against myself
> > Stopped at      db_enter+0x8:   lda     sp,10(sp)
> >     TID    PID    UID     PRFLAGS     PFLAGS  CPU  COMMAND
> > *204699  79774      0     0x14000      0x200    0  softnet0
> > db_enter(0, 7ffffe00e0003f8, 1, 8, 3, 8) at db_enter+0x8
> > panic(?, fffffe000002a628, 1b0, 10, a, 1) at panic+0xe8
> > mtx_enter(?, ?, 1b0, 10, a, 1) at mtx_enter+0xb4
> > ifq_set_oactive(?, ?, 1b0, 10, a, 1) at ifq_set_oactive+0x50
> 
> this is from src/sys/net/ifq.c r1.50 where i added a counter for the
> number of times oactive gets set. because there's checks and multiple
> things being tweaked i used the ifq mutex to serialise the updates.
> 
> de(4) uses ifq_deq_begin to try and shove an mbuf onto the hardware,
> which takes but doesnt release the ifq mutex until ifq_deq_commit or
> ifq_deq_rollback is called. so while it's holding the mutex is calls
> ifq_set_oactive, which also tries to take the mutex.
> 
> i honestly don't understand what de(4) is doing with the hardware and
> packet setup, so i dont feel confident changing the driver to avoid
> this. the least worst alternative i could think of is to provide an
> alternative set_oactive it can call.
> 
> the diff below should fix this.


Unfortunately, I'm not sure how to test it :-)  Trying to check out a
src tree via cvs over ssh trips the panic and the CD drive in the
machine has been giving me trouble.  It's old enough that it doesn't
have USB and I don't have a PCI USB card  to add it (although asking
around).

Hmm . . . I was able to do an install over https.  I'll see if I can use
bsd.rd to get bits onto the machine.


> > I reinstalled back to 7.6 from the November 13 snapshot I tried first
> > with no change.  I can apparently reproduce at will, but the machine is
> > pretty slow so diagnostics will be slow.  Both dmesg are below.
> > 
> > Is this something known or should I try gather more details?
> > (if so, anything in particular?)
> > 
> > 
> > [ using 1157232 bytes of bsd ELF symbol table ]
> > Copyright (c) 1982, 1986, 1989, 1991, 1993
> >     The Regents of the University of California.  All rights reserved.
> > Copyright (c) 1995-2024 OpenBSD. All rights reserved.  
> > https://www.OpenBSD.org
> > 
> > OpenBSD 7.6-current (GENERIC) #460: Wed Nov 13 17:53:07 MST 2024
> >     dera...@alpha.openbsd.org:/usr/src/sys/arch/alpha/compile/GENERIC
> > AlphaStation 200 4/166, 166MHz
> > 8192 byte page size, 1 processor.
> > real mem = 167772160 (160MB)
> > rsvd mem = 2048000 (1MB)
> > avail mem = 152584192 (145MB)
> > random: good seed from bootblocks
> > mainbus0 at root
> > cpu0 at mainbus0: ID 0 (primary), 21064-0 (pass 2 or 2.1)
> > apecs0 at mainbus0: DECchip 21071 Core Logic chipset
> > apecs0: DC21071-CA pass 2, 64-bit memory bus
> > apecs0: DC21071-DA pass 2
> > pci0 at apecs0 bus 0
> > siop0 at pci0 dev 6 function 0 "Symbios Logic 53c810" rev 0x02: isa irq 11
> > scsibus0 at siop0: 8 targets, initiator 7
> > sd0 at scsibus0 targ 0 lun 0: <SEAGATE, ST336753LC, 0006> 
> > serial.SEAGATE_ST336753LC_3HX1QFE000007422AWD1
> > sd0: 35003MB, 512 bytes/sector, 71687372 sectors
> > sio0 at pci0 dev 7 function 0 "Intel 82378IB ISA" rev 0x03
> > de0 at pci0 dev 11 function 0 "DEC 21040" rev 0x23, DEC 21040 pass 2.3: isa 
> > irq 5, address 08:00:2b:e4:f4:33
> > tga0 at pci0 dev 13 function 0 "DEC 21030" rev 0x02: DC21030 step B, board 
> > type T8-02
> > tga0: 1024 x 768, 8bpp, Bt485 RAMDAC
> > tga0: interrupting at isa irq 10
> > wsdisplay0 at tga0 mux 1
> > wsdisplay0: screen 0 added (std, vt100 emulation)
> > isa0 at sio0
> > isadma0 at isa0
> > fdc0 at isa0 port 0x3f0/6 irq 6 drq 2
> > com0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo
> > com0: console
> > com1 at isa0 port 0x2f8/8 irq 3: ns16550a, 16 byte fifo
> > pckbc0 at isa0 port 0x60/5 irq 1 irq 12
> > pcppi0 at isa0 port 0x61
> > spkr0 at pcppi0
> > lpt0 at isa0 port 0x3bc/4 irq 7
> > mcclock0 at isa0 port 0x70/2: mc146818 or compatible
> > stray isa irq 3
> > vscsi0 at root
> > scsibus1 at vscsi0: 256 targets
> > softraid0 at root
> > scsibus2 at softraid0: 256 targets
> > siop0: target 0 now using tagged 8 bit 10.0 MHz 8 REQ/ACK offset xfers
> > root on sd0a (0c5ed4a2cd41ff27.a) swap on sd0b dump on sd0b
> > fd0 at fdc0 drive 0: 1.44MB 80 cyl, 2 head, 18 sec
> > stray isa irq 3
> > panic: mtx 0xfffffe000002a628: locking against myself
> > Stopped at  db_enter+0x8:   lda     sp,10(sp)
> >     TID    PID    UID     PRFLAGS     PFLAGS  CPU  COMMAND                  
> >      
> > *434339  44111      0     0x14000      0x200    0  softnet0                 
> >       
> > db_enter(0, 7ffffe00e0003f8, 1, 8, 3, 8) at db_enter+0x8
> > panic(?, fffffe000002a628, 1c0, 10, a, 1) at panic+0xe8
> > mtx_enter(?, ?, 1c0, 10, a, 1) at mtx_enter+0xb4
> > ifq_set_oactive(?, ?, 1c0, 10, a, 1) at ifq_set_oactive+0x50
> > https://www.openbsd.org/ddb.html describes the minimum info required in bug
> > reports.  Insufficient info makes it difficult to find and fix bugs.
> > ddb> ddb> ddb> ddb> *cpu0: mtx 0xfffffe000002a628: locking against myself
> > ddb> syncing disks...3 2  done
> > WARNING: not updating battery clock
> > rebooting...
> 
> Index: net/ifq.c
> ===================================================================
> RCS file: /cvs/src/sys/net/ifq.c,v
> diff -u -p -r1.54 ifq.c
> --- net/ifq.c 9 Nov 2024 04:09:56 -0000       1.54
> +++ net/ifq.c 17 Nov 2024 06:04:59 -0000
> @@ -156,6 +156,17 @@ ifq_set_oactive(struct ifqueue *ifq)
>  }
>  
>  void
> +ifq_deq_set_oactive(struct ifqueue *ifq)
> +{
> +     MUTEX_ASSERT_LOCKED(&ifq->ifq_mtx);
> +
> +     if (!ifq->ifq_oactive) {
> +             ifq->ifq_oactive = 1;
> +             ifq->ifq_oactives++;
> +     }
> +}
> +
> +void
>  ifq_restart_task(void *p)
>  {
>       struct ifqueue *ifq = p;
> Index: net/ifq.h
> ===================================================================
> RCS file: /cvs/src/sys/net/ifq.h,v
> diff -u -p -r1.41 ifq.h
> --- net/ifq.h 10 Nov 2023 15:51:24 -0000      1.41
> +++ net/ifq.h 17 Nov 2024 06:04:59 -0000
> @@ -444,6 +444,7 @@ void               ifq_q_leave(struct ifqueue *, voi
>  void          ifq_serialize(struct ifqueue *, struct task *);
>  void          ifq_barrier(struct ifqueue *);
>  void          ifq_set_oactive(struct ifqueue *);
> +void          ifq_deq_set_oactive(struct ifqueue *);
>  
>  int           ifq_deq_sleep(struct ifqueue *, struct mbuf **, int, int,
>                    const char *, volatile unsigned int *,
> Index: dev/pci/if_de.c
> ===================================================================
> RCS file: /cvs/src/sys/dev/pci/if_de.c,v
> diff -u -p -r1.143 if_de.c
> --- dev/pci/if_de.c   24 May 2024 06:02:53 -0000      1.143
> +++ dev/pci/if_de.c   17 Nov 2024 06:04:59 -0000
> @@ -3897,7 +3897,7 @@ tulip_txput(tulip_softc_t * const sc, st
>  
>      if (sc->tulip_flags & TULIP_TXPROBE_ACTIVE) {
>       TULIP_CSR_WRITE(sc, csr_txpoll, 1);
> -     ifq_set_oactive(&sc->tulip_if.if_snd);
> +     ifq_deq_set_oactive(&sc->tulip_if.if_snd);
>       TULIP_PERFEND(txput);
>       return (NULL);
>      }
> @@ -3926,7 +3926,7 @@ tulip_txput(tulip_softc_t * const sc, st
>      sc->tulip_dbg.dbg_txput_finishes[6]++;
>  #endif
>      if (sc->tulip_flags & (TULIP_WANTTXSTART|TULIP_DOINGSETUP)) {
> -     ifq_set_oactive(&sc->tulip_if.if_snd);
> +     ifq_deq_set_oactive(&sc->tulip_if.if_snd);
>       if ((sc->tulip_intrmask & TULIP_STS_TXINTR) == 0) {
>           sc->tulip_intrmask |= TULIP_STS_TXINTR;
>           TULIP_CSR_WRITE(sc, csr_intr, sc->tulip_intrmask);
> 

-- 
andrew

Full-time system administration is a delicate balance 
    between proactiveness and laziness.
                      --  jhorwitz from use.perl.org

Reply via email to