> On Jun 27, 2025, at 11:02 PM, Bjoern A. Zeeb <bzeeb-li...@lists.zabbadoz.net>
> wrote:
>
> On Wed, 25 Jun 2025, Zhenlei Huang wrote:
>
> Hi,
>
> I appplied olce's change from the review but it didn't make a difference
> on my arm64 and now on a tree with local changes (wifi bits, user sapce
> bits, etc).
>
> Now I netbooted that tree on X86 hardware (an old Lenovo Laptop) and ran
> into something else (the same tree boots in a bhyve instance on a
> different machine from a local disk image).
>
> At the end of if_addgroup() I had added the following for local
> debugging (really crude sorry):
>
> ...
>
> + atomic_thread_fence_seq_cst();
> IF_ADDR_WLOCK(ifp);
> CK_STAILQ_INSERT_TAIL(&ifg->ifg_members, ifgm, ifgm_next);
> CK_STAILQ_INSERT_TAIL(&ifp->if_groups, ifgl, ifgl_next);
> IF_ADDR_WUNLOCK(ifp);
>
> IFNET_WUNLOCK(); // excl unlock
>
> if (new)
> EVENTHANDLER_INVOKE(group_attach_event, ifg);
> EVENTHANDLER_INVOKE(group_change_event, groupname);
>
> + IFNET_RLOCK(); // shared, panic
> + CK_STAILQ_FOREACH(ifgl, &ifp->if_groups, ifgl_next) {
> + if (bz_debug_groups) if_printf(ifp,
> "XXXXXXXXXXXXXXXXXXXXXXXXXXX-BZ %s:%d: ifgl %p, ifgl_group %p, ifg_group
> %p\n", __func__, __LINE__, ifgl, (ifgl != NULL) ? ifgl->ifgl_group : NULL,
> (ifgl != NULL && ifgl->ifgl_group != NULL) ? ifgl->ifgl_group->ifg_group :
> NULL);
> + }
> + IFNET_RUNLOCK();
> +
> return (0);
> }
>
>
>
> You see the anotation //shared ?
>
> I got a panic: excl->share with that.
Well, I applied identical patch with you and I can repeat that panic, but my
screen freezes and the top most stack is
```
_sx_slock_int() at _sx_slock_int+0x64/frame 0xff....
if_addgroup() at .....
....
device_attach() at ...
...
root_bus_configure() at ...
configure() at ...
mi_startup() at ..
```
I've no idea what's wrong. From the disassembly it appears the panic happens
just after witness_checkorder .
>
> The excl. is the
> IFNET_WLOCK(); // excl
> at the top of the function after the groupname check.
> But that gets unlocked before the event handler above
> so how can this happen?
I checked the event handlers and I think that is not relevant.
>
> Sadly I cannot even dump or anything as the keyboard is as dead
> as the rest of the laptop. Have to power cycle it hard.
>
> Apart from the debugging I added I have no local changes in sys/net
> in that tree. sys/kern seems to have no relevant changes either
> (added a bus func, toggle link_elf_leak_locals default, and a printf
> got an extra argument to print %d error when modules fail to load).
>
>
> I'll try a plain main (hopefully tonight) on that machine too but I am
> really at a loss here now that it's also happening on X86 and only for me
> and always around the same code there...
>
> I'll also try to boot this tree from a USB pen drive or something; not
> that my problem comes in from netbooing...
>
For the debugging purpose for ifgroup, I think you can omit the IFNET_RLOCK,
as at the moment adding group to the interface, there're no other threads
have opportunity to concurrently write to the interface.
> I'll keep you posted...
> /bz
>
> --
> Bjoern A. Zeeb r15:7
Best regards,
Zhenlei