> On 2 Jan 2024, at 20:52, Alexander Bluhm <alexander.bl...@gmx.net> wrote:
>
> On Tue, Jan 02, 2024 at 08:29:50PM +0300, Vitaliy Makkoveev wrote:
>>> On 2 Jan 2024, at 20:16, Alexander Bluhm <alexander.bl...@gmx.net> wrote:
>>>
>>> On Tue, Jan 02, 2024 at 03:45:10PM +0000, Stuart Henderson wrote:
>>>> panic: rw_enter: netlock locking against myself
>>>> Stopped at db_enter+0x14: popq %rbp
>>>> TID PID UID PRFLAGS PFLAGS CPU COMMAND
>>>> *388963 11506 755 0x2 0 0 snmpbulkwalk
>>>> 320140 61046 0 0x14000 0x200 1 reaper
>>>> db_enter() at db_enter+0x14
>>>> panic(ffffffff820ebaaa) at panic+0xc3
>>>> rw_enter(ffffffff824bd2d0,2) at rw_enter+0x252
>>>> hv_kvp(ffffffff8248d738) at hv_kvp+0x705
>>>> hv_event_intr(ffff80000008d000) at hv_event_intr+0x1ab
>>>> hv_intr() at hv_intr+0x1a
>>>> Xresume_hyperv_upcall() at Xresume_hyperv_upcall+0x27
>>>> Xspllower() at Xspllower+0x1d
>>>> task_add(ffff800000036200,ffff8000001533c0) at task_add+0x83
>>>> if_enqueue_ifq(ffff800000153060,fffffd80f6642900) at if_enqueue_ifq+0x6f
>>>> ether_output(ffff800000153060,fffffd80f6642900,fffffd80c8ef5c78,fffffd811e6414d8)
>>>> at ether_output+0x82
>>>> if_output_tso(ffff800000153060,ffff800025395d08,fffffd80c8ef5c78,fffffd811e6414d8,5dc)
>>>> at if_output_tso+0xe1
>>>> ip_output(fffffd80f6642900,0,fffffd80c8ef5c68,0,0,fffffd80c8ef5bf0,1da98542b9d8f733)
>>>> at ip_output+0x817
>>>> udp_output(fffffd80c8ef5bf0,fffffd80b468b400,fffffd80ac452c00,0) at
>>>> udp_output+0x3be
>>>> end trace frame: 0xffff800025395ee0, count: 0
>>>> https://www.openbsd.org/ddb.html describes the minimum info required in bug
>>>> reports. Insufficient info makes it difficult to find and fix bugs.
>>>> ddb{0}> Stopped at x86_ipi_db+0x16: leave
>>>> x86_ipi_db(ffff80002250dff0) at x86_ipi_db+0x16
>>>> x86_ipi_handler() at x86_ipi_handler+0x80
>>>> Xresume_lapic_ipi() at Xresume_lapic_ipi+0x27
>>>> _kernel_lock() at _kernel_lock+0xc2
>>>> reaper(ffff80002473c008) at reaper+0x133
>>>> end trace frame: 0x0, count: 10
>>>
>>> It is caused by this commit.
>>>
>>> ----------------------------
>>> revision 1.20
>>> date: 2023/09/23 13:01:12; author: mvs; state: Exp; lines: +5 -9;
>>> commitid:
>>> Gj5WdkySyube1RkO;
>>> Use shared netlock to protect if_list and ifa_list walkthrough and ifnet
>>> data access within kvp_get_ip_info().
>>>
>>> ok bluhm
>>> ----------------------------
>>>
>>> Net lock must not be acquired in interrupt context. It is wrong
>>> that hv tries to perform network operations from interrupt context.
>>> Actually it does not do much networking, it is just reads some
>>> interface addresses. The kernel lock that was used before is also
>>> questionable. Maybe some writers do not hold it anymore. There
>>> was much conversion from kernel to net lock.
>>>
>>> bluhm
>>
>> This is not interrupt context. tsleep_nsec() points that. The
>> problem lays in Xresume_hyperv_upcall() which can run this
>> from interrupt context.
>
> In this case it is a soft interrupt started by spllower(). I think
> hv_wait() is not involved.
>
> Xspllower() -> Xresume_hyperv_upcall -> hv_intr() -> hv_event_intr()
> -> hv_channel_schedule() -> ch->ch_handler() -> hv_kvp() ->
> hv_kvp_process() -> kvp_get_ip_info() -> NET_LOCK_SHARED()
>
> Anyway, converting this to a task makes sense to me. Let's see if
> your CHF_BATCHED diff works as expected.
>
> bluhm
>
>> hv_wait(struct hv_softc *sc, int (*cond)(struct hv_softc *, struct hv_msg *),
>> struct hv_msg *msg, void *wchan, const char *wmsg)
>> {
>> int s;
>>
>> KASSERT(cold ? msg->msg_flags & MSGF_NOSLEEP : 1);
>>
>> while (!cond(sc, msg)) {
>> if (msg->msg_flags & MSGF_NOSLEEP) {
>> delay(1000);
>> s = splnet();
>> hv_intr();
>> splx(s);
>> } else {
>> tsleep_nsec(wchan, PRIBIO, wmsg ? wmsg : "hvwait",
>> USEC_TO_NSEC(1000));
>> }
>> }
>> }
ifq_serialize(struct ifqueue *ifq, struct task *t)
{
struct task work;
if (ISSET(t->t_flags, TASK_ONQUEUE))
return;
mtx_enter(&ifq->ifq_task_mtx);
if (!ISSET(t->t_flags, TASK_ONQUEUE)) {
SET(t->t_flags, TASK_ONQUEUE);
TAILQ_INSERT_TAIL(&ifq->ifq_task_list, t, t_entry);
}
if (ifq->ifq_serializer == NULL) {
ifq->ifq_serializer = curcpu();
while ((t = TAILQ_FIRST(&ifq->ifq_task_list)) != NULL) {
TAILQ_REMOVE(&ifq->ifq_task_list, t, t_entry);
CLR(t->t_flags, TASK_ONQUEUE);
work = *t; /* copy to caller to avoid races */
mtx_leave(&ifq->ifq_task_mtx);
ifq_task_mtx initialized with IPL_NET priority, so this sequence
started this.
THR1 mtx_enter(&ifq->ifq_task_mtx)
THR2 splnet() /* hv_wait(), just before hv_intr() */
THR1 mtx_leave(&ifq->ifq_task_mtx)
THR1 `-> Xspllower()
THR1 skip
THR1 `-> hv_intr()
IMHO the spl*() protection in the network stack doesn’t work as
expected. It should be replaced by mutexes and netlock
respectively. Also, these gems where you can’t predict the lock
state within handler should be removed or reworked.
hv_channel_schedule(struct hv_channel *ch)
{
if (ch->ch_handler) {
if (!cold && (ch->ch_flags & CHF_BATCHED)) {
hv_channel_pause(ch);
task_add(ch->ch_taskq, &ch->ch_task);
} else
ch->ch_handler(ch->ch_ctx);
}
}