Hello tech@ I got a GPF on OpenBSD/amd64 4.8-stable GENERIC.MP from Mar 22 17:42:14. The only thing I did to it was bumping HFSC_MAX_CLASSES from 64 to 1024. It happened a few seconds after the ruleset reload (shell script running pfctl -f /etc/pf.conf at the end)
kernel: protection fault trap, code=0 Stopped at rn_match+0x1b: movl 0x18(%rsi),%ecx ddb{0}> trace rn_match() at rn_match+0x1b pfr_match_addr() at pfr_match_addr+0xcb pf_test_rule() at pf_test_rule+0x502 pf_test() at pf_test+0x10eb ip_output() at ip_output+0x416 ip_forward() at ip_forward+0x16f ipv4_input() at ipv4_input+0x66d ipintr() at ipintr+0x51 Xsoftnet() at Xsoftnet+0x4a --- interrupt --- end trace frame: 0x0, count: -9 0x8: onproc was idle0 ddb{0}> sh reg ds 0x30 es 0x101 fs 0x49b8 gs 0 rdi 0xffffffff80cf3700 pfr_sin rsi 0xdead0005deadbeef rbp 0xffff8000154b6800 rbx 0xfffffe807df1c040 rdx 0x2 rcx 0xffff8000154b6a90 rax 0x60330a0a r8 0xffff8000154b6968 r9 0xfffffe807d2dc648 r10 0 r11 0xffff8000154b6a70 r12 0xfffffe8071756548 r13 0x2 r14 0xffff800000193e00 r15 0xffffffff80cf3700 pfr_sin rip 0xffffffff803190bb rn_match+0x1b cs 0x8 rflags 0x10282 rsp 0xffff8000154b67c8 ss 0x10 The RSI quite reminds me the ifindex embedded in link-local IPv6 addr :-) pfr_match_addr() was so kind it saved me an IP address in RAX -> 10.10.51.96 is our well-known spammer. That pointed me to this thing we had in pf.conf: (I remember hitting pfctl grammar limits back then) anchor "spam" on $ext_if proto tcp from ! <smtp-white> \ to ! $our_smtp_server port 25 { block quick from <spammers> # something because of ABUSE my friend set up for him block log(all, to pflog1) quick to 62.67.240.16 block log(all, to pflog1) quick to 62.67.241.15 block log(all, to pflog1) quick to 62.67.194.35 pass out quick from ! $our_smtp_server keep state \ (max-src-conn-rate 5/60 overload <spammers>) \ queue (servers, lowdelay) nat-to $masq_addr } The table <spammers> is not defined anywhere, because pfctl grammar won't let me (or it complains about namespace collision, if defined 'global'). After a couple of days of poking at the code I went through all calls of pfr_destroy_ktable() and finally found the responsible one. The ktable reference counting is broken. The whole overload table vanishes without asking or looking whether it is used anywhere. The crash is actually quite easy to reproduce (won't take more than 5min): 0. Make sure no network connection interferes with your intended testing Watching ntpd silently increasing src_node life for 15 minutes is no fun. 1. Create some really simple ruleset: pass out quick keep state (max-src-conn-rate 1/60, overload <potazmo>) You can explicitly name the table before (but inside an anchor you can't). It just can't be persist, those tables obviously work fine. // overload table refcnt[RULE] will be 1 2. Create two TCP connections (=cause the overload). 3. Reload the firewall (pfctl -f /etc/pf.conf). // refcnt increases to 2 4. Overload again. 5. Wait for the first src_node to expire (watch pfctl -vvsS) // refcnt decreases to 1 6. Reload the firewall (you HAVE TO do it BEFORE second src_node expires) // refcnt increases to 2 and decreases back to 1 as the rules change 7. Wait for the second src_node to expire. // refcnt decreases to zero! table decides to vanish 8. Overload again. The first part of this diff doesn't fix this. The original code just didn't make any sense. The second part does. Both look like some pretty stupid typos... As this introduces a way to remotely crash a firewall from places protected by rate limit (if the attacker waits for us to reload ruleset and then just keeps trying opening conenections) and these parts of code are quite critical, this probably needs heavier testing than my laptop and Alix board. Any comments? Corrections? Input? Thanks in advance. -- Martin Pelikan Index: net/pf.c =================================================================== RCS file: /cvs/src/sys/net/pf.c,v retrieving revision 1.742 diff -u -p -r1.742 pf.c --- net/pf.c 24 Apr 2011 19:36:54 -0000 1.742 +++ net/pf.c 15 May 2011 23:03:42 -0000 @@ -593,7 +593,7 @@ pf_remove_src_node(struct pf_src_node *s if (sn->rule.ptr != NULL) { sn->rule.ptr->src_nodes--; if (sn->rule.ptr->states_cur <= 0 && - sn->rule.ptr->max_src_nodes <= 0) + sn->rule.ptr->src_nodes <= 0) pf_rm_rule(NULL, sn->rule.ptr); RB_REMOVE(pf_src_tree, &tree_src_tracking, sn); pf_status.scounters[SCNT_SRC_NODE_REMOVALS]++; Index: net/pf_ioctl.c =================================================================== RCS file: /cvs/src/sys/net/pf_ioctl.c,v retrieving revision 1.239 diff -u -p -r1.239 pf_ioctl.c --- net/pf_ioctl.c 19 Apr 2011 21:58:03 -0000 1.239 +++ net/pf_ioctl.c 15 May 2011 23:03:42 -0000 @@ -263,7 +263,7 @@ void pf_rm_rule(struct pf_rulequeue *rulequeue, struct pf_rule *rule) { if (rulequeue != NULL) { - if (rule->states_cur <= 0) { + if (rule->states_cur <= 0 && rule->src_nodes <= 0) { /* * XXX - we need to remove the table *before* detaching * the rule to make sure the table code does not delete