On Wed, Apr 16, 2025 at 9:25 PM Taylor R Campbell <riastr...@netbsd.org> wrote: > > > Module Name: src > > Committed By: ozaki-r > > Date: Wed Apr 16 05:29:45 UTC 2025 > > > > Modified Files: > > src/sys/net: if_bridge.c > > > > Log Message: > > bridge: avoid a race condition on stopping callout > > > > Without BRIDGE_LOCK, the callout can be scheduled after callout_halt. > > Oof! Is there a PR? This probably warrants pullup to 9 and 10 (and a > PR is helpful for tracking that).
No PR but I'm going to pullup. > > But I think this is still racy, because bridge_init and bridge_stop, > which set and clear IFF_RUNNING in ifp->if_flags, do so without > BRIDGE_LOCK. So we could see this sequence: > > cpu0 (bridge_timer) cpu1 (bridge_stop) > ------------------- ------------------ > observe IFF_RUNNING is set > ifp->if_flags &= ~IFF_RUNNING > callout_halt(&sc->sc_brcallout, NULL) > callout_reset(&sc->sc_callout, ...) > > In general, access to ifp->if_flags is forbidden without holding > IFNET_LOCK (there is still a lot code that does it but it's buggy!). > And the workqueue can't take IFNET_LOCK because bridge_stop does > workqueue_wait, which could deadlock. I think bridge_init and bridge_stop can't be called simultaneously and the above situation doesn't occur. bridge_stop is called from ioctl(SIOCSIFFLAGS) or interface destruction. ioctl calls are serialized each other and the interface destruction is guaranteed to be started after all ioctls finish. > > So I think we need a separate variable in struct bridge_softc to cache > the value of ifp->if_flags & IFF_RUNNING for use under BRIDGE_LOCK > instead of IFNET_LOCK -- or its complement, like we do in usbnet.c > (unp_txstopped/rxstopped, or unp_mcastactive) and other drivers like > if_vioif.c (netq_stopping). Nonetheless, I agree that using a separate variable like sc_stopping is more robust than depending on the assumption I explained above. So I'll change my fix like this. ozaki-r