> On Jun 21, 2025, at 11:49 PM, Bjoern A. Zeeb <bzeeb-li...@lists.zabbadoz.net> 
> wrote:
> 
> Hi,
> 
> it's too early for stab-week but ...
> 
> I had interfave groups ("all") disappear from the interface between
> inteerface creation and ifconfig prints during rc stage:
> 
> if7: XXXXXXXXXXXXXXXXXXXXXXXXXXX-BZ if_getgroup:1647: ifgl 
> 0xffffa080011aec90, ifgl_group 0, ifg_group 0
> 
> panic: vm_fault failed: 0xffff0000005e19c8 error 1
> cpuid = 0
> time = 8
> KDB: stack backtrace:
> db_trace_self() at db_trace_self
> db_trace_self_wrapper() at db_trace_self_wrapper+0x38
> vpanic() at vpanic+0x1a0
> panic() at panic+0x48
> data_abort() at data_abort+0x28c
> handle_el1h_sync() at handle_el1h_sync+0x18
> --- exception, esr 0x96000004
> strlcpy() at strlcpy+0x20
> ifhwioctl() at ifhwioctl+0x998
> ifioctl() at ifioctl+0x8bc
> kern_ioctl() at kern_ioctl+0x2e4
> sys_ioctl() at sys_ioctl+0x140
> do_el0_sync() at do_el0_sync+0x618
> handle_el0_sync() at handle_el0_sync+0x4c
> --- exception, esr 0x56000000
> KDB: enter: panic
> [ thread pid 635 tid 100249 ]
> Stopped at      kdb_enter+0x48: str     xzr, [x19, #2432]
> 
> 
> I intrumented the kernel and could not find any deletions.  It was more
> strange given the machine has 10 physical interfaces + lo and only for
> #7 and #8 it happened.

Does that happen every time, or only sometime ?

What is the driver of #7 and #8 interfaces ?

> 
> I added guards to the struct and that did not reveal any memory
> corruption.
> 
> Added a loop right at the end of if_addgroup() to make sure the list was
> coherent and it was (incl. lo which has two groups).
> 
> Then I started over-allocating the structs (size * 3) for ifgl and ifg
> and put the actual value in the middle.  That worked and the two guard
> structs showed no sign of memory corruptions.  So the larger allocation
> apparently helped or changed timing (which the printfs had not).

So the arch is aarch64 which has much weak memory model. I'm recently overhaul 
the attaching / detaching process of interfaces, and rely heavily on the mean 
of synchronization. More preciously, I'd expect this order,

All writes to softc / ifnet ( including if_addgroup() ) > if_link_ifnet() > 
ifunit()  .

You can read the > as 'happens before'.

Best regards,
Zhenlei

> 
> 
> Then I undid the changes and backed out to b93161a7e38d and that works
> just fine.
> 
> Went to c29459f901dc which shows the problem and panics again.
> Reduced it to eebc148f25c3.
> 
> So it's in the range of:
> 
> % git log --oneline b93161a7e38d..eebc148f25c3
> eebc148f25c3 sched_4bsd: ESTCPULIM(): Allow any value in the timeshare range
> 51a4ae05abe6 sched_4bsd: Remove RQ_PPQ from ESTCPULIM()'s formula
> a454ff6b0440 sched_4bsd: Move ESTCPULIM() after its macro dependencies
> a33225efb4bc sched_ule: Sanitize CPU's use and priority computations, and 
> ticks storage
> 6792f3411f6d sched_ule: Recover previous nice and anti-starvation behaviors
> dee257c28d93 sched: Internal priority ranges: Reduce kernel, increase 
> timeshare
> d710acecc00f runq: Add copyright
> 055b5b5f850d runq: Restrict <sys/runq.h> to kernel only
> a2d1c3bc2bb4 epoch_test: Assign different priorities using offset 1
> b2a9ee2a72ea runq: Remove userland references to RQ_PPQ in rtprio contexts
> e3a4b989d7f7 runq: Bump __FreeBSD_version after switching to 256 levels
> af8de65ef23e runq: Switch to 256 levels
> fd141584cf89 zfs: spa: ZIO_TASKQ_ISSUE: Use symbolic priority
> 8ecc41918066 Internal scheduling priorities: Always use symbolic ones
> baecdea10eb5 sched_ule: Use a single runqueue per CPU
> fdf31d274769 sched_ule: runq_steal_from(): Suppress first thread special case
> f4be333bc567 sched_ule: Re-implement stealing on top of runq common-code
> 9c3f4682bb90 runq: New runq_findq(), common low-level search implementation
> a31193172cb9 runq: New function runq_is_queue_empty(); Use it in ULE
> 757bab06fb59 runq: Tidy up and rename runq_setbit() and runq_clrbit()
> de78657a3aef runq: runq_check(): Re-implement on top of runq_findq()
> 439dc920f2d8 runq: Revamp runq_find*(), new runq_find_range()
> 200fc93dace7 runq: Re-order functions more logically
> 7e2502e3dec9 runq: More macros; Better and more consistent naming
> 57540a0666f6 runq: Clarity and style pass
> a11926f2a5f0 runq: API tidy up: 'pri' => 'idx', 'idx' as int, remove 
> runq_remove_idx()
> 28b54827f5c1 runq: Hide function prototypes under _KERNEL
> c21c24adde98 runq: More selective includes of <sys/runq.h> to reduce pollution
> 2fefe2c88b31 runq: Deduce most parameters, remove machine headers
> 
> 
> I do not know if it's feasible or doable to bi-sect those chanes further?
> 
> /bz
> 
> 
> -- 
> Bjoern A. Zeeb                                                     r15:7
> 




Reply via email to