Hello,

On Fri, May 27, 2022 at 10:33:06AM +0200, Hrvoje Popovski wrote:
> Hi all,
> 
> I'm running firewall in production with NET_TASKQ=6 with claudio@ "use
> timeout for rttimer" and bluhm@ "kernel lock in arp" diffs.
> After week or so of running smoothly I've got panic.

    thank you for being brave enough to run those bits in production.

</snip>

> bcbnfw1# uvm_fault(0xffffffff823c6ac0, 0x10, 0, 1) -> e
> kernel: page fault trap, code=0
> Stopped at      pf_state_export+0x4e:   movq    0x10(%rax),%rcx

    according to registers below rax is 0, we die because
    of NULL pointer dereference.

>     TID    PID    UID     PRFLAGS     PFLAGS  CPU  COMMAND
> *414231  37466      0     0x14000      0x200    3  softnet
>  180795  96693      0     0x14000      0x200    2  softnet
>   39487  54182      0     0x14000      0x200    0  softnet
>  221352  95757      0     0x14000      0x200    4  softnet
>  252845  32137      0     0x14000      0x200    1  softnet
>  294301  63695      0     0x14000      0x200    5  softnet
> pf_state_export(fffffd80611313c8,fffffd8877492ac0) at pf_state_export+0x4e
> pfsync_sendout() at pfsync_sendout+0x5e4
> pfsync_update_state(fffffd887df852b8) at pfsync_update_state+0x15b
> pf_test(2,1,ffff800000d48000,ffff800020b23a08) at pf_test+0xd53
> ip_input_if(ffff800020b23a08,ffff800020b23a14,4,0,ffff800000d48000) at 
> ip_input_if+0xcd
> ipv4_input(ffff800000d48000,fffffd80774a4000) at ipv4_input+0x39
> ether_input(ffff800000d48000,fffffd80774a4000) at ether_input+0x3ad
> carp_input(ffff800000d64000,fffffd80774a4000,5e000115) at carp_input+0x196
> ether_input(ffff800000d64000,fffffd80774a4000) at ether_input+0x1d9
> vlan_input(ffff800000b9f000,fffffd80774a4000,ffff800020b23c3c) at 
> vlan_input+0x23d
> ether_input(ffff800000b9f000,fffffd80774a4000) at ether_input+0x85
> if_input_process(ffff800000493048,ffff800020b23cd8) at if_input_process+0x6f
> ifiq_process(ffff800000491b00) at ifiq_process+0x69
> taskq_thread(ffff800000036500) at taskq_thread+0x11a
> end trace frame: 0x0, count: 1
> https://www.openbsd.org/ddb.html describes the minimum info required in bug
> reports.  Insufficient info makes it difficult to find and fix bugs.
> ddb{3}>
> 

    according to call stack we die somewhere here:

1192
1193            memset(sp, 0, sizeof(struct pfsync_state));
1194
1195            /* copy from state key */
1196            sp->key[PF_SK_WIRE].addr[0] = st->key[PF_SK_WIRE]->addr[0];
1197            sp->key[PF_SK_WIRE].addr[1] = st->key[PF_SK_WIRE]->addr[1];
1198            sp->key[PF_SK_WIRE].port[0] = st->key[PF_SK_WIRE]->port[0];
1199            sp->key[PF_SK_WIRE].port[1] = st->key[PF_SK_WIRE]->port[1];
1200            sp->key[PF_SK_WIRE].rdomain = 
htons(st->key[PF_SK_WIRE]->rdomain);
1201            sp->key[PF_SK_WIRE].af = st->key[PF_SK_WIRE]->af;

    looks like state key bound to st might be gone (st->key[] == NULL).
    I'll take closer look later today.

thanks and
regards
sashan

Reply via email to