> Date: Sun, 1 May 2022 20:13:57 +0200
> From: Anton Lindqvist <[email protected]>
>
> On Sat, Apr 30, 2022 at 04:07:51PM +0200, Mark Kettenis wrote:
> > > Date: Tue, 19 Apr 2022 07:32:36 +0200
> > > From: Anton Lindqvist <[email protected]>
> > >
> > > On Thu, Mar 24, 2022 at 07:41:44AM +0100, Anton Lindqvist wrote:
> > > > >Synopsis: bse: null dereference in genet_rxintr()
> > > > >Category: arm64
> > > > >Environment:
> > > > System : OpenBSD 7.1
> > > > Details : OpenBSD 7.1-beta (GENERIC.MP) #1594: Mon Mar 21
> > > > 06:55:12 MDT 2022
> > > >
> > > > [email protected]:/usr/src/sys/arch/arm64/compile/GENERIC.MP
> > > >
> > > > Architecture: OpenBSD.arm64
> > > > Machine : arm64
> > > > >Description:
> > > >
> > > > Booting my rpi4 often but not always causes a panic while rc(8) tries
> > > > to start
> > > > the bse network interface:
> > > >
> > > > panic: attempt to access user address 0x38 from EL1
> > > > Stopped at panic+0x160: cmp w21, #0x0
> > > > TID PID UID PRFLAGS PFLAGS CPU COMMAND
> > > > * 0 0 0 0x10000 0x200 0K swapper
> > > > db_enter() at panic+0x15c
> > > > panic() at do_el1h_sync+0x1f8
> > > > do_el1h_sync() at handle_el1h_sync+0x6c
> > > > handle_el1h_sync() at genet_rxintr+0x120
> > > > genet_rxintr() at genet_intr+0x74
> > > > genet_intr() at ampintc_irq_handler+0x14c
> > > > ampintc_irq_handler() at arm_cpu_irq+0x30
> > > > arm_cpu_irq() at handle_el1h_irq+0x6c
> > > > handle_el1h_irq() at ampintc_splx+0x80
> > > > ampintc_splx() at genet_ioctl+0x158
> > > > genet_ioctl() at ifioctl+0x308
> > > > ifioctl() at nfs_boot_init+0xc0
> > > > nfs_boot_init() at nfs_mountroot+0x3c
> > > > nfs_mountroot() at main+0x464
> > > > main() at virtdone+0x70
> > > >
> > > > >Fix:
> > > >
> > > > The mbuf associated with the current index is NULL. I noticed that the
> > > > NetBSD
> > > > driver allocates mbufs for each ring entry in genet_setup_dma(). But
> > > > even with
> > > > that in place the same panic still occurs. Enabling GENET_DEBUG shows
> > > > that the
> > > > total is quite high:
> > > >
> > > > RX pidx=0000ca07 total=51463
> > > >
> > > >
> > > > Since it's greater than GENET_DMA_DESC_COUNT (=256) the null
> > > > dereference will
> > > > still happen after doing more than 256 iterations in genet_rxintr()
> > > > since we
> > > > will start accessing mbufs cleared by the previous iteration.
> > > >
> > > > Here's a diff with what I've tried so far. The KASSERT() is just
> > > > capturing the
> > > > problem at an earlier stage. Any pointers would be much appreciated.
> > >
> > > Further digging reveals that writes to GENET_RX_DMA_PROD_INDEX are
> > > ignored by the hardware. That's why I ended up with a large amount of
> > > mbufs available in genet_rxintr() since the software and hardware state
> > > was out of sync. Honoring any existing value makes the problem go away
> > > and matches what u-boot[1] does as well.
> >
> > Writing to GENET_RX_DMA_PROD_INDEX works for me. The U-Boot code says
> > that writing 0 doesn't work. But even that works for me. So I'm
> > puzzled.
> >
> > > The current RX cidx/pidx defaults in genet_fill_rx_ring() where probably
> > > carefully selected as they ensure that the rx ring is filled with at
> > > least the configured low watermark number of mbufs. However, instead of
> > > being forced to ensure a pidx - cidx delta above 0 on the first
> > > invocations of genet_fill_rx_ring(), RX_DESC_COUNT could simply be
> > > passed as the max argument to if_rxr_get() which will clamp the value
> > > anyway.
> >
> > Well, what the code does is setting the "prod" index ahead of the
> > "cons" index to simulate a full ring. And then when we (partially)
> > fill the ring we increase "cons" to make descriptors available to the
> > hardware. This seems to work on my hardware and I've never seen the
> > crash you're seeing.
>
> After further meditation I realized this only happens when netbooting
> through u-boot. Disabling DMA as part of the hardware reset solves the
> issue, at last.
So reseting the MAC doesn't actually reset the DMA engine...
Doing this should be fine. Although the firmware really should
disable the DMA before handing control to the OS. Can you file a bug
in the appropriate channel?
> The diff includes one unrelated queue id correction. It's harmless in
> practice as qid is already equal to GENET_DMA_DEFAULT_QUEUE but makes
> the code a bit less surprising IMO.
Yeah, that makes sense. Wonder why NetBSD did it this way.
ok kettenis@
> diff --git sys/dev/ic/bcmgenet.c sys/dev/ic/bcmgenet.c
> index 923df40bdfc..c8a2fbfa2dc 100644
> --- sys/dev/ic/bcmgenet.c
> +++ sys/dev/ic/bcmgenet.c
> @@ -439,11 +439,46 @@ genet_setup_rxfilter(struct genet_softc *sc)
> WR4(sc, GENET_UMAC_MDF_CTRL, mdf_ctrl);
> }
>
> +void
> +genet_disable_dma(struct genet_softc *sc)
> +{
> + uint32_t val;
> +
> + /* Disable receiver */
> + val = RD4(sc, GENET_UMAC_CMD);
> + val &= ~GENET_UMAC_CMD_RXEN;
> + WR4(sc, GENET_UMAC_CMD, val);
> +
> + /* Stop receive DMA */
> + val = RD4(sc, GENET_RX_DMA_CTRL);
> + val &= ~GENET_RX_DMA_CTRL_EN;
> + val &= ~GENET_RX_DMA_CTRL_RBUF_EN(GENET_DMA_DEFAULT_QUEUE);
> + WR4(sc, GENET_RX_DMA_CTRL, val);
> +
> + /* Stop transmit DMA */
> + val = RD4(sc, GENET_TX_DMA_CTRL);
> + val &= ~GENET_TX_DMA_CTRL_EN;
> + val &= ~GENET_TX_DMA_CTRL_RBUF_EN(GENET_DMA_DEFAULT_QUEUE);
> + WR4(sc, GENET_TX_DMA_CTRL, val);
> +
> + /* Flush data in the TX FIFO */
> + WR4(sc, GENET_UMAC_TX_FLUSH, 1);
> + delay(10);
> + WR4(sc, GENET_UMAC_TX_FLUSH, 0);
> +
> + /* Disable transmitter */
> + val = RD4(sc, GENET_UMAC_CMD);
> + val &= ~GENET_UMAC_CMD_TXEN;
> + WR4(sc, GENET_UMAC_CMD, val);
> +}
> +
> int
> genet_reset(struct genet_softc *sc)
> {
> uint32_t val;
>
> + genet_disable_dma(sc);
> +
> val = RD4(sc, GENET_SYS_RBUF_FLUSH_CTRL);
> val |= GENET_SYS_RBUF_FLUSH_RESET;
> WR4(sc, GENET_SYS_RBUF_FLUSH_CTRL, val);
> @@ -512,7 +547,7 @@ genet_init_rings(struct genet_softc *sc, int qid)
> /* Enable transmit DMA */
> val = RD4(sc, GENET_TX_DMA_CTRL);
> val |= GENET_TX_DMA_CTRL_EN;
> - val |= GENET_TX_DMA_CTRL_RBUF_EN(GENET_DMA_DEFAULT_QUEUE);
> + val |= GENET_TX_DMA_CTRL_RBUF_EN(qid);
> WR4(sc, GENET_TX_DMA_CTRL, val);
>
> /* RX ring */
> @@ -549,7 +584,7 @@ genet_init_rings(struct genet_softc *sc, int qid)
> /* Enable receive DMA */
> val = RD4(sc, GENET_RX_DMA_CTRL);
> val |= GENET_RX_DMA_CTRL_EN;
> - val |= GENET_RX_DMA_CTRL_RBUF_EN(GENET_DMA_DEFAULT_QUEUE);
> + val |= GENET_RX_DMA_CTRL_RBUF_EN(qid);
> WR4(sc, GENET_RX_DMA_CTRL, val);
> }
>
> @@ -607,7 +642,6 @@ genet_stop(struct genet_softc *sc)
> {
> struct ifnet *ifp = &sc->sc_ac.ac_if;
> struct genet_bufmap *bmap;
> - uint32_t val;
> int i;
>
> timeout_del(&sc->sc_rxto);
> @@ -615,32 +649,7 @@ genet_stop(struct genet_softc *sc)
>
> mii_down(&sc->sc_mii);
>
> - /* Disable receiver */
> - val = RD4(sc, GENET_UMAC_CMD);
> - val &= ~GENET_UMAC_CMD_RXEN;
> - WR4(sc, GENET_UMAC_CMD, val);
> -
> - /* Stop receive DMA */
> - val = RD4(sc, GENET_RX_DMA_CTRL);
> - val &= ~GENET_RX_DMA_CTRL_EN;
> - val &= ~GENET_RX_DMA_CTRL_RBUF_EN(GENET_DMA_DEFAULT_QUEUE);
> - WR4(sc, GENET_RX_DMA_CTRL, val);
> -
> - /* Stop transmit DMA */
> - val = RD4(sc, GENET_TX_DMA_CTRL);
> - val &= ~GENET_TX_DMA_CTRL_EN;
> - val &= ~GENET_TX_DMA_CTRL_RBUF_EN(GENET_DMA_DEFAULT_QUEUE);
> - WR4(sc, GENET_TX_DMA_CTRL, val);
> -
> - /* Flush data in the TX FIFO */
> - WR4(sc, GENET_UMAC_TX_FLUSH, 1);
> - delay(10);
> - WR4(sc, GENET_UMAC_TX_FLUSH, 0);
> -
> - /* Disable transmitter */
> - val = RD4(sc, GENET_UMAC_CMD);
> - val &= ~GENET_UMAC_CMD_TXEN;
> - WR4(sc, GENET_UMAC_CMD, val);
> + genet_disable_dma(sc);
>
> /* Disable interrupts */
> genet_disable_intr(sc);
>