Re: wireguard problem?

Nemanja Domazetović Mon, 04 Mar 2024 03:40:37 -0800

HI all

In deed my colleague recently reported problem with server who has queueing 
mechanism. But in this case, this is not an issue. I also checked other 
locations, which are working properly so far, and there I can see that on those 
location, which are not patched to syspatch 13, I still didn't have any kernel 
panic. Two of those locations have 013_unbound but I still didn't get any 
kernel panic.


Right now, I will wait for next kernel panic on my device and check this show 
registers. Hope it will give more insight into this problem.

Srdačan pozdrav / Best regards
-- 
Nemanja Domazetović
Senior IT Network inženjer
Kappa Star Group,
Bulevar kneza Aleksandra Karađorđevića 36,
11000 Beograd,
Srbija
e-mail: nemanja.domazeto...@kappastar.com
web: https://www.kappastar.com
Čuvajte drveće. Nemojte štampati ovu poruku ako to nije neophodno. / Please 
consider the environment before printing this email.

-----Original Message-----
From: owner-b...@openbsd.org <owner-b...@openbsd.org> On Behalf Of Claudio Jeker
Sent: Monday, March 4, 2024 11:39 AM
To: Alexander Haensch <alexander.haen...@ipc.uni-tuebingen.de>
Cc: bugs@openbsd.org
Subject: Re: wireguard problem?

On Mon, Mar 04, 2024 at 10:00:01AM +0100, Alexander Haensch wrote:
> For us this crash was introduced in OpenBSD 7.4 . That was the reason 
> we reverted back to 7.3.
> 
> In our case, the crash goes through the ixgbe driver and bpf_filter, 
> but the start it always in wg_encap_worker
> 
> It this patch somehow related to the issue? 
> https://github.com/openbsd/src/commit/dbebf518da97d8c0c7746cce71f5ea4a
> e909cb89
> We are using fq_codel in pf.
> 

There is a design error in wg(4) that causes crashes when used with queueing / 
traffic shaping. The problem is that wg(4) uses a sleeping lock in a place 
where the code is not allowed to sleep.
 
Not sure if this is the issue here. But there was a report not long ago on 
bugs@ where the noise code was called from a timeout. This triggers an assert 
in the sleep / scheduler because the code tries to sleep in interrup context.

Right now do not use queueing with wg(4).

> On 04.03.24 05:10, Alexandr Nedvedicky wrote:
> > Hello,
> > 
> > I don't know what to think of it. It does not make much sense to me 
> > at the moment. If I'm not mistaken kernel crashes here at line 156:
> > 
> >      119 chacha20poly1305_encrypt(
> >      120     uint8_t *dst,
> >      121     const uint8_t *src,
> >      122     const size_t src_len,
> >      123     const uint8_t *ad,
> >      124     const size_t ad_len,
> >      125     const uint64_t nonce,
> >      126     const uint8_t key[CHACHA20POLY1305_KEY_SIZE]
> >      127 ) {                                                                
> >                                                                             
> >                                                                             
> >                                                                             
> >                  128         poly1305_state poly1305_ctx;                   
> >                                                                             
> >                                                                             
> >                                                                             
> >                              129         chacha_ctx chacha_ctx;
> >      130         union {
> >      131                 uint8_t b0[CHACHA20POLY1305_KEY_SIZE];
> >      132                 uint64_t lens[2];
> >      133         } b = { { 0 } };
> >      ...
> >      152
> >      153         poly1305_finish(&poly1305_ctx, dst + src_len);
> >      154
> >      155         explicit_bzero(&chacha_ctx, sizeof(chacha_ctx));
> >      156         explicit_bzero(&b, sizeof(b));
> >      157 }
> > 
> > explicit_bzero() as a kind of memset() alias. would you be able to 
> > grab the same information plus output of 'show registers' command in 
> > ddb? next time when APU box will crash.
> > 
> > I wonder what makes those two boxes so special that wg makes them to 
> > crash. Can you think of something? this might help every detail 
> > counts.
> > 
> > thanks and
> > regards
> > sashan
> > 
> > 
> > On Sat, Mar 02, 2024 at 06:26:09PM +0000, Nemanja Domazetovi? wrote:
> > > HI all
> > > 
> > > This is first time I'm reporting a problem.
> > > 
> > > We have over 15 spokes (PCEngine APU4) on OpenBSD 7.4 (syspatched 
> > > up to
> > > 013_unbound) running wireguard to our central location (also 
> > > OpenBSD 7.4 syspatched to 011_ssh). On 2 of those spokes OBSD is crashing 
> > > once per day.
> > > Others are still working fine. Downbelow is the error I receive, 
> > > and I also added otput of commands (show uvm, show bcstats, show 
> > > panic). Before we siwtched to wireguard, they had IPsec and we didn't 
> > > have those problems.
> > > 
> > > 
> > > 
> > > e This is what I got from serial console once I got problem reported from 
> > > users:
> > > 
> > > 
> > > 
> > > uvm_fault(0xfffff825891a0, 0x8, 0, 2) -> e
> > > 
> > > kernel: page fault trap, code=2
> > > 
> > > Stopped at      memset+0x52:    repe stosq      %es:(%rdi)
> > > 
> > > 
> > > 
> > >      TID    PID    UID     PRFLAGS     PFLAGS  CPU  COMMAND
> > > 
> > > 
> > > 
> > >   364339  89739      0    0x100032          0    0  login_passwd
> > > 
> > > 
> > > 
> > >   226712  33190      0     0x14000      0x200    2  wg_crypt
> > > 
> > > 
> > > 
> > > *225815  71495      0     0x14000      0x200    1  wg_crypt
> > > 
> > > 
> > > 
> > >   282190  76283      0     0x14000      0x200    3  softnet3
> > > 
> > > 
> > > 
> > > memset() at memset+0x52
> > > 
> > > chacha20poly1305_encrypt(fffffd80bcb20010,fffffd80bcb20010,200,0,0
> > > ,11a73,df3c64
> > > 
> > > 05eb66b84a) at chacha20poly1305_encrypt+0x162
> > > 
> > > noise_remote_encrypt(ffff80000801c740,fffffd80bcb20004,ffff8000227
> > > 9f4f0,fffffd8
> > > 
> > > 0bcb20010,200) at noise_remote_encrypt+0x113
> > > 
> > > wg_encap(ffff800000791000,fffffd80bcb1ad00) at wg_encap+0x176
> > > 
> > > wg_encap_worker(ffff800000791000) at wg_encap_worker+0x7a
> > > 
> > > taskq_thread(ffff800000766a00) at taskq_thread+0x100
> > > 
> > > end trace frame: 0x0, count: 9
> > > 
> > > 
> > >    *   show uvm
> > > Current UVM status:
> > >    pagesize=4096 (0x1000), pagemask=0xfff, pageshift=12
> > >    1005750 VM pages: 22187 active, 154041 inactive, 1 wired, 648481 free 
> > > (81919 zero)
> > >    min  10% (25) anon, 10% (25) vnode, 5% (12) vtext
> > >    freemin=33525, free-target=44700, inactive-target=0, wired-max=335250
> > >    faults=5744799, traps=5747928, intrs=64193578, ctxswitch=186137457 
> > > fpuswitch=0
> > >    softint=3299560, syscalls=6630594, kmapent=13
> > >    fault counts:
> > >      noram=0, noanon=0, noamap=0, pgwait=0, pgrele=0
> > >      ok relocks(total)=207395(210035), anget(retries)=2231572(0), 
> > > amapcopy=2237418
> > >      neighbor anon/obj pg=461876/3422212, gets(lock/unlock)=1305770/210057
> > >      cases: anon=1858336, anoncow=373236, obj=1098448, prcopy=204660, 
> > > przero=2210104
> > >    daemon and swap counts:
> > >      woke=0, revs=0, scans=0, obscans=0, anscans=0
> > >      busy=0, freed=0, reactivate=0, deactivate=0
> > >      pageouts=0, pending=0, nswget=0
> > >      nswapdev=1
> > >      swpages=263063, swpginuse=0, swpgonly=0 paging=0 --db_ kernel 
> > > pointers:
> > >      objs(kern)=0xffffffff8252e560
> > > 
> > > 
> > >    *   show bcstats
> > > Current BufferCache status:
> > > numbufs 41004 busymapped 0, delwri 5 kvaslots 6553 avail kva slots 
> > > 6553 bufpages 162582, dmapages 162582, dirtypages 10 pendingreads 
> > > 0, pendingwrites 0 highflips 0, highflops 0, dmaflips 0
> > > 
> > > 
> > >    *   show panic
> > > *cpu1: uvm_fault(0xffffffff825891a0, 0x8, 0, 2) -> e
> > > 
> > > Srda?an pozdrav / Best regards
> > > --
> > > Nemanja Domazetovi?
> > > Senior IT Network in?enjer
> > > Kappa Star Group,
> > > Bulevar kneza Aleksandra Kara?or?evi?a 36,
> > > 11000 Beograd,
> > > Srbija
> > > e-mail: nemanja.domazeto...@kappastar.com
> > > web: https://www.kappastar.com
> > > P ?uvajte drve?e. Nemojte ?tampati ovu poruku ako to nije neophodno. / 
> > > Please consider the environment before printing this email.
> > > 
> 
> --
> Dr. rer. nat. Alexander Haensch
> 
> 
> AG Weimar
> Institute of Theoretical and Physical Chemistry Eberhard Karls 
> University Tübingen Auf der Morgenstelle 15
> 72076 Tuebingen
> Germany
> 
> Tel1: +49(0) 7071 1389483
> Tel2: +49(0) 7071 2977633
> Fax : +49(0) 7071 295960
> 

--
:wq Claudio

Re: wireguard problem?

Reply via email to