Hello,

On Tue, Feb 07, 2023 at 09:12:38PM +0100, Hrvoje Popovski wrote:
</snip>
> 
> 
> Hi,
> 
> this panic is with plain snapshot and I didn't do anything. I will leave
> box in ddb if something else is needed.
> 

    It does not look like there is more data to gather in ddb.
    may be I'm quick in my judgment. this is the relevant part
    of pfsync_bulk_update() function:
2456         int i = 0;
                /* `i` seems to be kept in %r12 */
2457
2458         NET_LOCK();
2459         sc = pfsyncif;
2460         if (sc == NULL)
2461                 goto out;
2462
2463         rw_enter_read(&pf_state_list.pfs_rwl);
2464         st = sc->sc_bulk_next;
                /* `st` is kept in %r15
2465         sc->sc_bulk_next = NULL;
2466
2467         for (;;) {
2468                 if (st->sync_state == PFSYNC_S_NONE &&
2469                     st->timeout < PFTM_MAX &&
2470                     st->pfsync_time <= sc->sc_ureq_received) {
2471                         pfsync_update_state_req(st);
2472                         i++;
2473                 }




> 
> ddb{0}> dmesg
> OpenBSD 7.2-current (GENERIC.MP) #1021: Sun Feb  5 09:52:50 MST 2023
>     dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> 
> 
> r620-2# uvm_fault(0xffffffff824fb2f8, 0x14e, 0, 1) -> e
> kernel: page fault trap, code=0
> Stopped at      pfsync_bulk_update+0x60:        cmpb    $0xff,0x14e(%r15)
>     TID    PID    UID     PRFLAGS     PFLAGS  CPU  COMMAND
> *109809  58944      0     0x14000 0x40000200    0K softclock
> pfsync_bulk_update(0) at pfsync_bulk_update+0x60

    we seems to be dying at line 2468 due to a NULL pointer dereference

> softclock_thread(ffff8000fffff050) at softclock_thread+0x13b
> end trace frame: 0x0, count: 13
> https://www.openbsd.org/ddb.html describes the minimum info required in
> bug reports.  Insufficient info makes it difficult to find and fix bugs.
> ddb{0}>
> 
</snip>

> r11               0xfbec2dfc846efdb5
> r12                                0
> r13               0xffffffff82503f80    timeout_proc
> r14               0xffff8000009d8000
> r15                                0
> rip               0xffffffff8101aea0    pfsync_bulk_update+0x60

    r12 (`i`) is 0 which suggest the loop is most likely in its first iteration
    r15 (`st`) is 0 ... so looks like it's trivial bug we try to send
    a bulk but there is nothing to send. this makes me wonder if diff below
    makes your test box more stable.


can you give a try a diff below?

thanks a lot for your help

regards
sashan


--------8<---------------8<---------------8<------------------8<--------
diff --git a/sys/net/if_pfsync.c b/sys/net/if_pfsync.c
index e2c86971336..1fa58f6fab9 100644
--- a/sys/net/if_pfsync.c
+++ b/sys/net/if_pfsync.c
@@ -2464,6 +2464,11 @@ pfsync_bulk_update(void *arg)
        st = sc->sc_bulk_next;
        sc->sc_bulk_next = NULL;
 
+       if (st == NULL) {
+               rw_exit_read(&pf_state_list.pfs_rwl);
+               goto out;
+       }
+
        for (;;) {
                if (st->sync_state == PFSYNC_S_NONE &&
                    st->timeout < PFTM_MAX &&

Reply via email to