On 2021/11/13 22:41, Stuart Henderson wrote:
> On 2021/11/13 21:49, Stuart Henderson wrote:
> > On 2021/11/13 18:04, Alexander Bluhm wrote:
> > > Hi,
> > > 
> > > To make IPsec MP safe we need refcounting for the tdb.  The diff
> > > below is part of something bigger we have at genua.  Although it
> > > does not cover timeouts and the tdb reaper yet, I want to get this
> > > in as a frist step.
> > > 
> > > It passes regress but there are setups that are not covered.  Bridge
> > > and pfsync with IPsec and TCP signature need special care.
> > > 
> > > When testing, please check for tdb leaks.  The vmstat -m tdb in use
> > > culumn must be 0.  It it is not, try ipsecctl -F.  If ipsecctl -sa
> > > shows nothing, but vmstat -m shows a used tdb, then it is a bug.
> > > 
> > > Name        Size Requests Fail    InUse Pgreq Pgrel Npage Hiwat Minpg 
> > > Maxpg Idle
> > > tdb          904       56    0        0     7     7     0     7     0     
> > > 8    0
> > 
> > Tested with a box running isakmpd at home. Config is:
> > 
> > isakmpd_flags=-Kv -D0=29 -D1=49 -D2=10 -D3=30 -D5=20 -D6=30 -D8=30 -D9=30 
> > -D10=20
> > 
> > ike passive esp \
> >         from xxx.xx.xxx.1 (xxx.xx.xxx.0/27) to 0.0.0.0/0 \
> >         local xxx.xx.xxx.1 \
> >     main auth hmac-sha2-256 enc aes group ecp256 \
> >     quick enc aes-128-gcm group ecp256 \
> >         srcid xxx
> > 
> > isakmpd starts at boot, 3 devices connect in and tunnels come up,
> > FLOWS and SAD show in ipsecctl -sa.
> > 
> > isakmpd crashes shortly after starting (as described in previous mails;
> > this is not new, it happens very often on the first start at boot),
> > FLOWS and SAD remain showing. At this point they still show InUse:
> > 
> > Name        Size Requests Fail    InUse Pgreq Pgrel Npage Hiwat Minpg Maxpg 
> > Idle
> > tdb         1088        9    0        6     1     0     1     1     0     8 
> >    0
> > 
> > After ipsecctl -F they are freed:
> > 
> > tdb         1088        9    0        0     1     0     1     1     0     8 
> >    1
> > 
> > Restart isakmpd, ipsecctl -f /etc/ipsec.conf, devices connect back in:
> > 
> > tdb         1088       18    0        6     2     1     1     1     0     8 
> >    0
> > 
> > (And at this point isakmpd is usually stable).
> > 
> > I will leave a shell loop running vmstat -m every few minutes to check
> > if it grows.
> > 
> > (Bad timing for testing TCPMD5 because over the last couple of days I have
> > just removed bgpd+ospfd from the machines that would have been good
> > choices to run test diffs..I might be able to get some simple setup
> > back up and running though)
> 
> Ah, that was interesting:
> 
> Sat Nov 13 21:44:17 GMT 2021
> tdb         1088       18    0        6     2     1     1     1     0     8   
>  0
> 
> Sat Nov 13 21:54:44 GMT 2021
> tdb         1088       18    0        6     2     1     1     1     0     8   
>  0
> 
> Sat Nov 13 22:04:44 GMT 2021
> tdb         1088       27    0       12     2     1     1     1     0     8   
>  0
> 
> and then shortly after, boom. Unfortunately I didn't get
> much from ddb before it hanged after I tried changing cpu.
> 
> [playback]
> Xsyscall() at Xsyscall+0x128
> end of kernel               
> end trace frame: 0x7f7ffffb2d20, count: 248
> End of stack trace.                        
> splassert: tdb_free: want 2 have 0
> Starting stack trace...           
> tdb_free(ffff800000f40b88) at tdb_free+0x47
> pfkeyv2_send(fffffd80603603b8,ffff800000f44c00,170) at pfkeyv2_send+0x656
> pfkeyv2_output(fffffd8064cb5e00,fffffd80603603b8,0,0) at pfkeyv2_output+0x8a
> pfkeyv2_usrreq(fffffd80603603b8,9,fffffd8064cb5e00,0,0,ffff8000ffff9508) at 
> pfkeyv2_usrreq+0x1b0
> sosend(fffffd80603603b8,0,ffff80001fd8f6f0,0,0,0) at sosend+0x3a9             
>                   
> dofilewritev(ffff8000ffff9508,7,ffff80001fd8f6f0,0,ffff80001fd8f7f0) at 
> dofilewritev+0x14d
> sys_writev(ffff8000ffff9508,ffff80001fd8f790,ffff80001fd8f7f0) at 
> sys_writev+0xd2         
> syscall(ffff80001fd8f860) at syscall+0x3a9                                    
>    
> Xsyscall() at Xsyscall+0x128              
> end of kernel               
> end trace frame: 0x7f7ffffb2d20, count: 248
> End of stack trace.                        
> splassert: tdb_free: want 2 have 0
> Starting stack trace...           
> tdb_free(ffff800000f40308) at tdb_free+0x47
> pfkeyv2_send(fffffd80603603b8,ffff800000eff300,50) at pfkeyv2_send+0x656
> pfkeyv2_output(fffffd8062506e00,fffffd80603603b8,0,0) at pfkeyv2_output+0x8a
> pfkeyv2_usrreq(fffffd80603603b8,9,fffffd8062506e00,0,0,ffff8000ffff9508) at 
> pfkeyv2_usrreq+0x1b0
> sosend(fffffd80603603b8,0,ffff80001fd8f6f0,0,0,0) at sosend+0x3a9             
>                   
> dofilewritev(ffff8000ffff9508,7,ffff80001fd8f6f0,0,ffff80001fd8f7f0) at 
> dofilewritev+0x14d
> sys_writev(ffff8000ffff9508,ffff80001fd8f790,ffff80001fd8f7f0) at 
> sys_writev+0xd2         
> syscall(ffff80001fd8f860) at syscall+0x3a9                                    
>    
> Xsyscall() at Xsyscall+0x128              
> end of kernel               
> end trace frame: 0x7f7ffffc3170, count: 248
> End of stack trace.                        
> splassert: ipsp_ids_free: want 2 have 0
> Starting stack trace...                
> ipsp_ids_free(ffff800000f30880) at ipsp_ids_free+0x4e
> tdb_free(ffff800000f40308) at tdb_free+0x13c         
> pfkeyv2_send(fffffd80603603b8,ffff800000eff300,50) at pfkeyv2_send+0x656
> pfkeyv2_output(fffffd8062506e00,fffffd80603603b8,0,0) at pfkeyv2_output+0x8a
> pfkeyv2_usrreq(fffffd80603603b8,9,fffffd8062506e00,0,0,ffff8000ffff9508) at 
> pfkeyv2_usrreq+0x1b0
> sosend(fffffd80603603b8,0,ffff80001fd8f6f0,0,0,0) at sosend+0x3a9             
>                   
> dopfainlice:wr iketernv(eflf ffd8i0a0gn0offsftfic9 5a08ss,7er,ftfiofnf 
> 8"0r00ef1cfdnt8 f6!=f 0,~00",f ffaiflfe80d:0 01fiflde8 f7"f/s0ys)/ akter 
> n/kern_
> sdyofncilh.ecw"ri,t elivn+e0 x8124d4
>                                     
> Stopped at      db_enter+0x10:  popq    %rbp
>     TID    PID    UID     PRFLAGS     PFLAGS  CPU  COMMAND
>  185954   5669     68        0x10          0    1  isakmpd
>   67015  91815      0     0x14000      0x200    3  softnet
> *318814   6181      0     0x14000 0x40000200    0  softclock
> db_enter() at db_enter+0x10                                 
> panic(ffffffff81e4a37b) at panic+0xbf
> __assert(ffffffff81eb3b74,ffffffff81e1e19f,338,ffffffff81e4f078) at 
> __assert+0x
> 25                                                                            
>  
> refcnt_rele(ffff800000f40330) at refcnt_rele+0x6f
> tdb_delete(ffff800000f40308) at tdb_delete+0x84  
> tdb_timeout(ffff800000f40308) at tdb_timeout+0x39
> softclock_thread(ffff8000fffff500) at softclock_thread+0x16e
> end trace frame: 0x0, count: 8                              
> https://www.openbsd.org/ddb.html describes the minimum info required in bug
> reports.  Insufficient info makes it difficult to find and fix bugs.       
> ddb{0}> [-- sthen@localhost attached -- Sat Nov 13 22:23:17 2021]   
> tr
> db_enter() at db_enter+0x10
> panic(ffffffff81e4a37b) at panic+0xbf
> __assert(ffffffff81eb3b74,ffffffff81e1e19f,338,ffffffff81e4f078) at 
> __assert+0x
> 25                                                                            
>  
> refcnt_rele(ffff800000f40330) at refcnt_rele+0x6f
> tdb_delete(ffff800000f40308) at tdb_delete+0x84  
> tdb_timeout(ffff800000f40308) at tdb_timeout+0x39
> softclock_thread(ffff8000fffff500) at softclock_thread+0x16e
> end trace frame: 0x0, count: -7                             
> ddb{0}> sh reg                 
> rdi                                0
> rsi                             0x14
> rbp               0xffff80001fbdc6e0
> rbx                                0
> rdx               0xde00000003ff0239
> rcx                            0x202
> rax                             0x62
> r8                 0x101010101010101
> r9                                 0
> r10                                0
> r11               0x254de177953953f9
> r12               0xffffffff82196a00    cpu_info_full_primary+0x2a00
> r13               0xffffffff82275110    timeout_proc                
> r14                                0                
> r15               0xffffffff81e4a37b    cmd0646_9_tim_udma+0x2e00b
> rip               0xffffffff81dad9f0    db_enter+0x10             
> cs                               0x8                 
> rflags                         0x206
> rsp               0xffff80001fbdc6e0
> ss                              0x10
> db_enter+0x10:  popq    %rbp        
> ddb{0}> ps                  
>    PID     TID   PPID    UID  S       FLAGS  WAIT          COMMAND
>  20274   32066  93874   1000  3    0x100083  nanoslp       sleep  
>  93874  445500  10192   1000  3    0x10008b  sigsusp       ksh  
>  10192  370220  36870   1000  3        0x98  kqread        sshd
>  36870  222408  98260      0  3        0x92  kqread        sshd
>   5669  185954  75457     68  7        0x10                isakmpd
>  75457  276857      1      0  3        0x80  netio         isakmpd
>  21598  237571      1      0  3    0x100083  ttyin         getty  
>   4506  488964      1      0  3    0x100098  kqread        cron 
>  39797   52192  15626    846  3    0x100098  kqread        dhcpcd
>  59318  229873  15626    846  3    0x100098  kqread        dhcpcd
>   7830  517562  15626      0  3        0x88  kqread        dhcpcd
>  15626  323417      1    846  3    0x100098  kqread        dhcpcd
>  19237  354049  75741    720  3        0x90  kqread        lldpd 
>  75741   66119      1      0  3        0x80  netio         lldpd
>  61311  177515      1     99  3    0x100090  kqread        sndiod
>  11783  282500      1    110  3    0x100090  kqread        sndiod
>  82651  136045  37762      0  3    0x100090  kqread        tftp-proxy
>  37762  379684      1    108  3        0x90  kqread        tftp-proxy
>   1745  324688      1    109  3        0x90  kqread        ftp-proxy 
>  39681  116848  88999     95  3    0x100092  kqread        smtpd    
>  25384  256665  88999    103  3    0x100092  kqread        smtpd
>  92345  176419  88999     95  3    0x100092  kqread        smtpd
>  55442  232201  88999     95  3    0x100092  kqread        smtpd
>  77443  215878  88999     95  3    0x100092  kqread        smtpd
>  69162  261319  88999     95  3    0x100092  kqread        smtpd
>  88999  111572      1      0  3    0x100080  kqread        smtpd
>  58046  516005  58609     94  3    0x100092  kqread        rad  
>  90680  269647  58609     94  3    0x100092  kqread        rad
>  58609  188614      1      0  3    0x100080  kqread        rad
>   3419  227743      1     77  3    0x100090  kqread        dhcpd
>  61767  205183      1      0  3    0x100080  kqread        snmpd
>  51987  257332      1     91  3        0x92  kqread        snmpd
>  98260   56979      1      0  3        0x88  kqread        sshd 
>  13528   58314      1      0  3    0x100080  kqread        ntpd
>  18801  320473  51204     83  3    0x100092  kqread        ntpd
>  51204  405642      1     83  3    0x100092  kqread        ntpd
>  53483  209302  30989     53  3        0x90  kqread        unbound
>  30989  499872      1     53  3        0x90  kqread        unbound
>  16457  234403  32406     74  3    0x100092  bpf           pflogd 
>  32406   88142      1      0  3        0x80  netio         pflogd
>  80238  516045  79768     73  3    0x100090  kqread        syslogd
>  79768   67472      1      0  3    0x100082  netio         syslogd
>  43550  409362      1      0  3    0x100080  kqread        resolvd
>    442  379277  99464     77  3    0x100092  kqread        dhcpleased
>  47451  452762  99464     77  3    0x100092  kqread        dhcpleased
>  99464  433528      1      0  3        0x80  kqread        dhcpleased
>  21886  318205      0      0  3     0x14200  bored         wg_crypt  
>  97050  229388      0      0  3     0x14200  bored         wg_crypt
>   4440    5494      0      0  3     0x14200  bored         wg_crypt
>  72609  120590      0      0  3     0x14200  bored         wg_crypt
>  20025   29295      0      0  3     0x14200  bored         wg_handshake
>  43757  478463      0      0  3     0x14200  bored         wg_handshake
>  55892  346923  51675    115  3    0x100092  kqread        slaacd      
>  76843  163407  51675    115  3    0x100092  kqread        slaacd
>  51675  404409      1      0  3    0x100080  kqread        slaacd
>  28362  202426      0      0  3     0x14200  bored         smr   
>  82207  376534      0      0  3     0x14200  pgzero        zerothread
>   1524   51196      0      0  3     0x14200  aiodoned      aiodoned  
>  22770  341109      0      0  3     0x14200  syncer        update  
>  33241  337755      0      0  3     0x14200  cleaner       cleaner
>  27893  102676      0      0  3     0x14200  reaper        reaper 
>  79368  439400      0      0  3     0x14200  pgdaemon      pagedaemon
>  98877   12806      0      0  3     0x14200  bored         sensors   
>  71274  388091      0      0  3     0x14200  mmctsk        sdmmc0 
>  33311  377928      0      0  3     0x14200  usbtsk        usbtask
>  83240  116348      0      0  3     0x14200  usbatsk       usbatsk
>  45461  344150      0      0  3  0x40014200  acpi0         acpi0  
>  94708  135825      0      0  3  0x40014200                idle3
>  46611  496404      0      0  7  0x40014200                idle2
>  26280   26936      0      0  3  0x40014200                idle1
>  91815   67015      0      0  7     0x14200                softnet
>  90110  476121      0      0  3     0x14200  bored         systqmp
>  12475  134606      0      0  3     0x14200  bored         systq  
> * 6181  318814      0      0  7  0x40014200                softclock
>  38662    5703      0      0  3  0x40014200                idle0    
>      1  335266      0      0  3        0x82  wait          init 
>      0       0     -1      0  3     0x10200  scheduler     swapper
> ddb{0}>           ps /o                                           
>     TID    PID    UID     PRFLAGS     PFLAGS  CPU  COMMAND
>  185954   5669     68        0x10          0    1  isakmpd
>   67015  91815      0     0x14000      0x200    3  softnet
> *318814   6181      0     0x14000 0x40000200    0  softclock
> ddb{0}> tr /t 0t185954                                      
> ffff800000f40308(ffffffffffffffff,ffffffffffffffff,ffff800000f402e8,0,0,0) at > 0
> xffff800000f40308                                                             
>  
> end trace frame: 0x0, count: -1
> ddb{0}> mach ddbcpu 1          
> 
> <ddb hanged here>
> 
> However, looking at the earlier console log, there were splasserts
> starting from before it went multiuser:

I am now running bsd.gdb with MP_LOCKDEBUG and WITNESS.
Here is an alternative version of the splassert traces from
during boot with line numbers.

starting early daemons: syslogd pflogd unbound ntpd isakmpd.
starting RPC daemons:.
savecore: no core dump
checking quotas: done.
clearing /tmp
kern.securelevel: 0 -> 1
creating runtime link editor directory cache.
preserving editor files.
starting network daemons: sshd snmpd dhcpd rad smtpd ftpproxy ftpproxy6(failed) 
tftpproxy sndiod.
starting package daemons: lldpd dhcpcdsplassert: tdb_free: want 2 have 0
Starting stack trace...
tdb_free(ffff800000f2db80) at tdb_free+0x47 [/sys/netinet/ip_ipsp.c:909]
pfkeyv2_send(fffffd805e21e3f8,ffff800000f27e00,170) at pfkeyv2_send+0x656 
[/sys/net/pfkeyv2.c:2141]
pfkeyv2_output(fffffd806731ea00,fffffd805e21e3f8,0,0) at pfkeyv2_output+0x8a 
[/sys/net/pfkeyv2.c:442]
pfkeyv2_usrreq(fffffd805e21e3f8,9,fffffd806731ea00,0,0,ffff80001f6aefc8) at 
pfkeyv2_usrreq+0x1b0 [/sys/net/pfkeyv2.c:394]
sosend(fffffd805e21e3f8,0,ffff80001f6fdd90,0,0,0) at sosend+0x3a9 
[/sys/kern/uipc_socket.c:597]
dofilewritev(ffff80001f6aefc8,7,ffff80001f6fdd90,0,ffff80001f6fde90) at 
dofilewritev+0x14d [/sys/kern/sys_generic.c:382]
sys_writev(ffff80001f6aefc8,ffff80001f6fde30,ffff80001f6fde90) at 
sys_writev+0xd2 [/sys/kern/sys_generic.c:328]
syscall(ffff80001f6fdf00) at syscall+0x3a9 [/sys/sys/syscall_mi.h:102]
Xsyscall() at Xsyscall+0x128
end of kernel
end trace frame: 0x7f7ffffb31b0, count: 248
End of stack trace.
splassert: tdb_free: want 2 have 0
Starting stack trace...
tdb_free(ffff800000f2db80) at tdb_free+0x47 [/sys/netinet/ip_ipsp.c:909]
pfkeyv2_send(fffffd805e21e3f8,ffff800000f27e00,190) at pfkeyv2_send+0x656 
[/sys/net/pfkeyv2.c:2141]
pfkeyv2_output(fffffd8061acef00,fffffd805e21e3f8,0,0) at pfkeyv2_output+0x8a 
[/sys/net/pfkeyv2.c:442]
pfkeyv2_usrreq(fffffd805e21e3f8,9,fffffd8061acef00,0,0,ffff80001f6aefc8) at 
pfkeyv2_usrreq+0x1b0 [/sys/net/pfkeyv2.c:394]
sosend(fffffd805e21e3f8,0,ffff80001f6fdd90,0,0,0) at sosend+0x3a9 
[/sys/kern/uipc_socket.c:597]
dofilewritev(ffff80001f6aefc8,7,ffff80001f6fdd90,0,ffff80001f6fde90) at 
dofilewritev+0x14d [/sys/kern/sys_generic.c:382]
sys_writev(ffff80001f6aefc8,ffff80001f6fde30,ffff80001f6fde90) at 
sys_writev+0xd2 [/sys/kern/sys_generic.c:328]
syscall(ffff80001f6fdf00) at syscall+0x3a9 [/sys/sys/syscall_mi.h:102]
Xsyscall() at Xsyscall+0x128
end of kernel
end trace frame: 0x7f7ffffb31b0, count: 248
End of stack trace.
splassert: tdb_free: want 2 have 0
Starting stack trace...
tdb_free(ffff800000f2db80) at tdb_free+0x47 [/sys/netinet/ip_ipsp.c:909]
pfkeyv2_send(fffffd805e21e3f8,ffff800000f27200,170) at pfkeyv2_send+0x656 
[/sys/net/pfkeyv2.c:2141]
pfkeyv2_output(fffffd8060d5f800,fffffd805e21e3f8,0,0) at pfkeyv2_output+0x8a 
[/sys/net/pfkeyv2.c:442]
pfkeyv2_usrreq(fffffd805e21e3f8,9,fffffd8060d5f800,0,0,ffff80001f6aefc8) at 
pfkeyv2_usrreq+0x1b0 [/sys/net/pfkeyv2.c:394]
sosend(fffffd805e21e3f8,0,ffff80001f6fdd90,0,0,0) at sosend+0x3a9 
[/sys/kern/uipc_socket.c:597]
dofilewritev(ffff80001f6aefc8,7,ffff80001f6fdd90,0,ffff80001f6fde90) at 
dofilewritev+0x14d [/sys/kern/sys_generic.c:382]
sys_writev(ffff80001f6aefc8,ffff80001f6fde30,ffff80001f6fde90) at 
sys_writev+0xd2 [/sys/kern/sys_generic.c:328]
syscall(ffff80001f6fdf00) at syscall+0x3a9 [/sys/sys/syscall_mi.h:102]
Xsyscall() at Xsyscall+0x128
end of kernel
end trace frame: 0x7f7ffffb31b0, count: 248
End of stack trace.
.
starting local daemons: cron.
Sat Nov 13 22:57:42 GMT 2021

OpenBSD/amd64 (jodrell.spacehopper.org) (tty00)

login: reorder_kernel: failed -- see 
/usr/share/relink/kernel/GENERIC.MP/relink.log
[-- sthen@localhost attached -- Sat Nov 13 22:55:14 2021]


isakmpd crashed; I flushed pfkey, restarted isakmpd and reloaded ipsec.conf,
then saw one more "splassert: tdb_free: want 2 have 0" after every new SA

Now waiting for another kernel crash.

Reply via email to