On Mon, Mar 04, 2024 at 10:00:01AM +0100, Alexander Haensch wrote:
> For us this crash was introduced in OpenBSD 7.4 . That was the reason we
> reverted back to 7.3.
> 
> In our case, the crash goes through the ixgbe driver and bpf_filter, but the
> start it always in wg_encap_worker
> 
> It this patch somehow related to the issue? 
> https://github.com/openbsd/src/commit/dbebf518da97d8c0c7746cce71f5ea4ae909cb89
> We are using fq_codel in pf.
> 

There is a design error in wg(4) that causes crashes when used with
queueing / traffic shaping. The problem is that wg(4) uses a sleeping lock
in a place where the code is not allowed to sleep.
 
Not sure if this is the issue here. But there was a report not long ago on
bugs@ where the noise code was called from a timeout. This triggers an
assert in the sleep / scheduler because the code tries to sleep in
interrup context.

Right now do not use queueing with wg(4).

> On 04.03.24 05:10, Alexandr Nedvedicky wrote:
> > Hello,
> > 
> > I don't know what to think of it. It does not make much sense
> > to me at the moment. If I'm not mistaken kernel crashes
> > here at line 156:
> > 
> >      119 chacha20poly1305_encrypt(
> >      120     uint8_t *dst,
> >      121     const uint8_t *src,
> >      122     const size_t src_len,
> >      123     const uint8_t *ad,
> >      124     const size_t ad_len,
> >      125     const uint64_t nonce,
> >      126     const uint8_t key[CHACHA20POLY1305_KEY_SIZE]
> >      127 ) {                                                                
> >                                                                             
> >                                                                             
> >                                                                             
> >                  128         poly1305_state poly1305_ctx;                   
> >                                                                             
> >                                                                             
> >                                                                             
> >                              129         chacha_ctx chacha_ctx;
> >      130         union {
> >      131                 uint8_t b0[CHACHA20POLY1305_KEY_SIZE];
> >      132                 uint64_t lens[2];
> >      133         } b = { { 0 } };
> >      ...
> >      152
> >      153         poly1305_finish(&poly1305_ctx, dst + src_len);
> >      154
> >      155         explicit_bzero(&chacha_ctx, sizeof(chacha_ctx));
> >      156         explicit_bzero(&b, sizeof(b));
> >      157 }
> > 
> > explicit_bzero() as a kind of memset() alias. would you be able to
> > grab the same information plus output of 'show registers' command
> > in ddb? next time when APU box will crash.
> > 
> > I wonder what makes those two boxes so special that wg makes them
> > to crash. Can you think of something? this might help every
> > detail counts.
> > 
> > thanks and
> > regards
> > sashan
> > 
> > 
> > On Sat, Mar 02, 2024 at 06:26:09PM +0000, Nemanja Domazetovi? wrote:
> > > HI all
> > > 
> > > This is first time I'm reporting a problem.
> > > 
> > > We have over 15 spokes (PCEngine APU4) on OpenBSD 7.4 (syspatched up to
> > > 013_unbound) running wireguard to our central location (also OpenBSD 7.4
> > > syspatched to 011_ssh). On 2 of those spokes OBSD is crashing once per 
> > > day.
> > > Others are still working fine. Downbelow is the error I receive, and I 
> > > also
> > > added otput of commands (show uvm, show bcstats, show panic). Before we
> > > siwtched to wireguard, they had IPsec and we didn't have those problems.
> > > 
> > > 
> > > 
> > > e This is what I got from serial console once I got problem reported from 
> > > users:
> > > 
> > > 
> > > 
> > > uvm_fault(0xfffff825891a0, 0x8, 0, 2) -> e
> > > 
> > > kernel: page fault trap, code=2
> > > 
> > > Stopped at      memset+0x52:    repe stosq      %es:(%rdi)
> > > 
> > > 
> > > 
> > >      TID    PID    UID     PRFLAGS     PFLAGS  CPU  COMMAND
> > > 
> > > 
> > > 
> > >   364339  89739      0    0x100032          0    0  login_passwd
> > > 
> > > 
> > > 
> > >   226712  33190      0     0x14000      0x200    2  wg_crypt
> > > 
> > > 
> > > 
> > > *225815  71495      0     0x14000      0x200    1  wg_crypt
> > > 
> > > 
> > > 
> > >   282190  76283      0     0x14000      0x200    3  softnet3
> > > 
> > > 
> > > 
> > > memset() at memset+0x52
> > > 
> > > chacha20poly1305_encrypt(fffffd80bcb20010,fffffd80bcb20010,200,0,0,11a73,df3c64
> > > 
> > > 05eb66b84a) at chacha20poly1305_encrypt+0x162
> > > 
> > > noise_remote_encrypt(ffff80000801c740,fffffd80bcb20004,ffff80002279f4f0,fffffd8
> > > 
> > > 0bcb20010,200) at noise_remote_encrypt+0x113
> > > 
> > > wg_encap(ffff800000791000,fffffd80bcb1ad00) at wg_encap+0x176
> > > 
> > > wg_encap_worker(ffff800000791000) at wg_encap_worker+0x7a
> > > 
> > > taskq_thread(ffff800000766a00) at taskq_thread+0x100
> > > 
> > > end trace frame: 0x0, count: 9
> > > 
> > > 
> > >    *   show uvm
> > > Current UVM status:
> > >    pagesize=4096 (0x1000), pagemask=0xfff, pageshift=12
> > >    1005750 VM pages: 22187 active, 154041 inactive, 1 wired, 648481 free 
> > > (81919 zero)
> > >    min  10% (25) anon, 10% (25) vnode, 5% (12) vtext
> > >    freemin=33525, free-target=44700, inactive-target=0, wired-max=335250
> > >    faults=5744799, traps=5747928, intrs=64193578, ctxswitch=186137457 
> > > fpuswitch=0
> > >    softint=3299560, syscalls=6630594, kmapent=13
> > >    fault counts:
> > >      noram=0, noanon=0, noamap=0, pgwait=0, pgrele=0
> > >      ok relocks(total)=207395(210035), anget(retries)=2231572(0), 
> > > amapcopy=2237418
> > >      neighbor anon/obj pg=461876/3422212, gets(lock/unlock)=1305770/210057
> > >      cases: anon=1858336, anoncow=373236, obj=1098448, prcopy=204660, 
> > > przero=2210104
> > >    daemon and swap counts:
> > >      woke=0, revs=0, scans=0, obscans=0, anscans=0
> > >      busy=0, freed=0, reactivate=0, deactivate=0
> > >      pageouts=0, pending=0, nswget=0
> > >      nswapdev=1
> > >      swpages=263063, swpginuse=0, swpgonly=0 paging=0
> > > --db_ kernel pointers:
> > >      objs(kern)=0xffffffff8252e560
> > > 
> > > 
> > >    *   show bcstats
> > > Current BufferCache status:
> > > numbufs 41004 busymapped 0, delwri 5
> > > kvaslots 6553 avail kva slots 6553
> > > bufpages 162582, dmapages 162582, dirtypages 10
> > > pendingreads 0, pendingwrites 0
> > > highflips 0, highflops 0, dmaflips 0
> > > 
> > > 
> > >    *   show panic
> > > *cpu1: uvm_fault(0xffffffff825891a0, 0x8, 0, 2) -> e
> > > 
> > > Srda?an pozdrav / Best regards
> > > --
> > > Nemanja Domazetovi?
> > > Senior IT Network in?enjer
> > > Kappa Star Group,
> > > Bulevar kneza Aleksandra Kara?or?evi?a 36,
> > > 11000 Beograd,
> > > Srbija
> > > e-mail: nemanja.domazeto...@kappastar.com
> > > web: https://www.kappastar.com
> > > P ?uvajte drve?e. Nemojte ?tampati ovu poruku ako to nije neophodno. / 
> > > Please consider the environment before printing this email.
> > > 
> 
> -- 
> Dr. rer. nat. Alexander Haensch
> 
> 
> AG Weimar
> Institute of Theoretical and Physical Chemistry
> Eberhard Karls University Tübingen
> Auf der Morgenstelle 15
> 72076 Tuebingen
> Germany
> 
> Tel1: +49(0) 7071 1389483
> Tel2: +49(0) 7071 2977633
> Fax : +49(0) 7071 295960
> 

-- 
:wq Claudio

Reply via email to