On 18 Mar 2022, at 19:02, Mike Karels wrote: > It looks like the IPv4 multicast code has not been fully converted to > use epochs. I installed this week's snapshot of -current, configured > and started mrouted, and started rwhod -m. The system crashed shortly > thereafter with this: > > panic: Assertion in_epoch(net_epoch_preempt) failed at > /usr/src/sys/netinet/ip_output.c:343 > cpuid = 15 > time = 1647609865 > KDB: stack backtrace: > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe01b51a39d0 > vpanic() at vpanic+0x17f/frame 0xfffffe01b51a3a20 > panic() at panic+0x43/frame 0xfffffe01b51a3a80 > ip_output() at ip_output+0x15f9/frame 0xfffffe01b51a3b80 > phyint_send() at phyint_send+0x107/frame 0xfffffe01b51a3be0 > ip_mdq() at ip_mdq+0x259/frame 0xfffffe01b51a3c60 > X_ip_mrouter_set() at X_ip_mrouter_set+0x9e4/frame 0xfffffe01b51a3d30 > sosetopt() at sosetopt+0xee/frame 0xfffffe01b51a3d80 > kern_setsockopt() at kern_setsockopt+0xad/frame 0xfffffe01b51a3de0 > sys_setsockopt() at sys_setsockopt+0x24/frame 0xfffffe01b51a3e00 > amd64_syscall() at amd64_syscall+0x12e/frame 0xfffffe01b51a3f30 > fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe01b51a3f30 > --- syscall (105, FreeBSD ELF64, sys_setsockopt), rip = 0x821b72dda, rsp = > 0x8204c06f8, rbp = 0x8204c0750 --- > KDB: enter: panic > > The kgdb backtrace is appended. > > It looks like ip_mroute is protected in the forwarding path (it's called > from ip_input) and the output path, but not in the setup path from > setsockopt(). At least the MRT_ADD_MFC call needs to enter an epoch. > I tried adding epoch handling in add_mfc(), and that seems to work. > The alternative would be to do it in Xip_mrouter_set() so it would cover > all the calls. Any opinions? > Your analysis looks reasonable. I think I’d suggest adding the NET_EPOCH_ENTER() calls in add_mfc(). We already do that in add_vif(), so we’d be following existing choices.
I’d also suggest adding NET_EPOCH_ASSERT() to everything which directly or indirectly calls ip_output(). That should help us catch other potential issues like this one. Br, Kristof