On Tue, Jan 02, 2024 at 08:29:50PM +0300, Vitaliy Makkoveev wrote: > > On 2 Jan 2024, at 20:16, Alexander Bluhm <alexander.bl...@gmx.net> wrote: > > > > On Tue, Jan 02, 2024 at 03:45:10PM +0000, Stuart Henderson wrote: > >> panic: rw_enter: netlock locking against myself > >> Stopped at db_enter+0x14: popq %rbp > >> TID PID UID PRFLAGS PFLAGS CPU COMMAND > >> *388963 11506 755 0x2 0 0 snmpbulkwalk > >> 320140 61046 0 0x14000 0x200 1 reaper > >> db_enter() at db_enter+0x14 > >> panic(ffffffff820ebaaa) at panic+0xc3 > >> rw_enter(ffffffff824bd2d0,2) at rw_enter+0x252 > >> hv_kvp(ffffffff8248d738) at hv_kvp+0x705 > >> hv_event_intr(ffff80000008d000) at hv_event_intr+0x1ab > >> hv_intr() at hv_intr+0x1a > >> Xresume_hyperv_upcall() at Xresume_hyperv_upcall+0x27 > >> Xspllower() at Xspllower+0x1d > >> task_add(ffff800000036200,ffff8000001533c0) at task_add+0x83 > >> if_enqueue_ifq(ffff800000153060,fffffd80f6642900) at if_enqueue_ifq+0x6f > >> ether_output(ffff800000153060,fffffd80f6642900,fffffd80c8ef5c78,fffffd811e6414d8) > >> at ether_output+0x82 > >> if_output_tso(ffff800000153060,ffff800025395d08,fffffd80c8ef5c78,fffffd811e6414d8,5dc) > >> at if_output_tso+0xe1 > >> ip_output(fffffd80f6642900,0,fffffd80c8ef5c68,0,0,fffffd80c8ef5bf0,1da98542b9d8f733) > >> at ip_output+0x817 > >> udp_output(fffffd80c8ef5bf0,fffffd80b468b400,fffffd80ac452c00,0) at > >> udp_output+0x3be > >> end trace frame: 0xffff800025395ee0, count: 0 > >> https://www.openbsd.org/ddb.html describes the minimum info required in bug > >> reports. Insufficient info makes it difficult to find and fix bugs. > >> ddb{0}> Stopped at x86_ipi_db+0x16: leave > >> x86_ipi_db(ffff80002250dff0) at x86_ipi_db+0x16 > >> x86_ipi_handler() at x86_ipi_handler+0x80 > >> Xresume_lapic_ipi() at Xresume_lapic_ipi+0x27 > >> _kernel_lock() at _kernel_lock+0xc2 > >> reaper(ffff80002473c008) at reaper+0x133 > >> end trace frame: 0x0, count: 10 > > > > It is caused by this commit. > > > > ---------------------------- > > revision 1.20 > > date: 2023/09/23 13:01:12; author: mvs; state: Exp; lines: +5 -9; > > commitid: > > Gj5WdkySyube1RkO; > > Use shared netlock to protect if_list and ifa_list walkthrough and ifnet > > data access within kvp_get_ip_info(). > > > > ok bluhm > > ---------------------------- > > > > Net lock must not be acquired in interrupt context. It is wrong > > that hv tries to perform network operations from interrupt context. > > Actually it does not do much networking, it is just reads some > > interface addresses. The kernel lock that was used before is also > > questionable. Maybe some writers do not hold it anymore. There > > was much conversion from kernel to net lock. > > > > bluhm > > This is not interrupt context. tsleep_nsec() points that. The > problem lays in Xresume_hyperv_upcall() which can run this > from interrupt context.
In this case it is a soft interrupt started by spllower(). I think hv_wait() is not involved. Xspllower() -> Xresume_hyperv_upcall -> hv_intr() -> hv_event_intr() -> hv_channel_schedule() -> ch->ch_handler() -> hv_kvp() -> hv_kvp_process() -> kvp_get_ip_info() -> NET_LOCK_SHARED() Anyway, converting this to a task makes sense to me. Let's see if your CHF_BATCHED diff works as expected. bluhm > hv_wait(struct hv_softc *sc, int (*cond)(struct hv_softc *, struct hv_msg *), > struct hv_msg *msg, void *wchan, const char *wmsg) > { > int s; > > KASSERT(cold ? msg->msg_flags & MSGF_NOSLEEP : 1); > > while (!cond(sc, msg)) { > if (msg->msg_flags & MSGF_NOSLEEP) { > delay(1000); > s = splnet(); > hv_intr(); > splx(s); > } else { > tsleep_nsec(wchan, PRIBIO, wmsg ? wmsg : "hvwait", > USEC_TO_NSEC(1000)); > } > } > } >