Hello tech@

I got a GPF on OpenBSD/amd64 4.8-stable GENERIC.MP from Mar 22 17:42:14.
The only thing I did to it was bumping HFSC_MAX_CLASSES from 64 to 1024.
It happened a few seconds after the ruleset reload (shell script running
pfctl -f /etc/pf.conf at the end)


kernel: protection fault trap, code=0
Stopped at      rn_match+0x1b: movl     0x18(%rsi),%ecx
ddb{0}> trace
rn_match() at rn_match+0x1b
pfr_match_addr() at pfr_match_addr+0xcb
pf_test_rule() at pf_test_rule+0x502
pf_test() at pf_test+0x10eb
ip_output() at ip_output+0x416
ip_forward() at ip_forward+0x16f
ipv4_input() at ipv4_input+0x66d
ipintr() at ipintr+0x51
Xsoftnet() at Xsoftnet+0x4a
--- interrupt ---
end trace frame: 0x0, count: -9
0x8:

onproc was idle0

ddb{0}> sh reg
ds      0x30
es      0x101
fs      0x49b8
gs      0
rdi     0xffffffff80cf3700      pfr_sin
rsi     0xdead0005deadbeef
rbp     0xffff8000154b6800
rbx     0xfffffe807df1c040
rdx     0x2
rcx     0xffff8000154b6a90
rax     0x60330a0a
r8      0xffff8000154b6968
r9      0xfffffe807d2dc648
r10     0
r11     0xffff8000154b6a70
r12     0xfffffe8071756548
r13     0x2
r14     0xffff800000193e00
r15     0xffffffff80cf3700      pfr_sin
rip     0xffffffff803190bb      rn_match+0x1b
cs      0x8
rflags  0x10282
rsp     0xffff8000154b67c8
ss      0x10

The RSI quite reminds me the ifindex embedded in link-local IPv6 addr :-)

pfr_match_addr() was so kind it saved me an IP address in RAX
-> 10.10.51.96 is our well-known spammer. That pointed me to this thing
we had in pf.conf: (I remember hitting pfctl grammar limits back then)

anchor "spam" on $ext_if proto tcp from ! <smtp-white> \
    to ! $our_smtp_server port 25 {
        block quick from <spammers>
        # something because of ABUSE my friend set up for him
        block log(all, to pflog1) quick to 62.67.240.16
        block log(all, to pflog1) quick to 62.67.241.15
        block log(all, to pflog1) quick to 62.67.194.35
        pass out quick from ! $our_smtp_server keep state       \
            (max-src-conn-rate 5/60 overload <spammers>)        \
            queue (servers, lowdelay) nat-to $masq_addr
}

The table <spammers> is not defined anywhere, because pfctl grammar won't
let me (or it complains about namespace collision, if defined 'global').

After a couple of days of poking at the code I went through all calls of
pfr_destroy_ktable() and finally found the responsible one.  The ktable
reference counting is broken.  The whole overload table vanishes without
asking or looking whether it is used anywhere.


The crash is actually quite easy to reproduce (won't take more than 5min):

0. Make sure no network connection interferes with your intended testing
Watching ntpd silently increasing src_node life for 15 minutes is no fun.

1. Create some really simple ruleset:

pass out quick keep state (max-src-conn-rate 1/60, overload <potazmo>)

You can explicitly name the table before (but inside an anchor you can't).
It just can't be persist, those tables obviously work fine.
// overload table refcnt[RULE] will be 1

2. Create two TCP connections (=cause the overload).

3. Reload the firewall (pfctl -f /etc/pf.conf). // refcnt increases to 2

4. Overload again.

5. Wait for the first src_node to expire (watch pfctl -vvsS)
// refcnt decreases to 1

6. Reload the firewall (you HAVE TO do it BEFORE second src_node expires)
// refcnt increases to 2 and decreases back to 1 as the rules change

7. Wait for the second src_node to expire.
// refcnt decreases to zero! table decides to vanish

8. Overload again.


The first part of this diff doesn't fix this.  The original code just
didn't make any sense.  The second part does. Both look like some pretty
stupid typos...

As this introduces a way to remotely crash a firewall from places
protected by rate limit (if the attacker waits for us to reload ruleset
and then just keeps trying opening conenections) and these parts of code
are quite critical, this probably needs heavier testing than my laptop
and Alix board.

Any comments? Corrections? Input?
Thanks in advance.

--
Martin Pelikan


Index: net/pf.c
===================================================================
RCS file: /cvs/src/sys/net/pf.c,v
retrieving revision 1.742
diff -u -p -r1.742 pf.c
--- net/pf.c    24 Apr 2011 19:36:54 -0000      1.742
+++ net/pf.c    15 May 2011 23:03:42 -0000
@@ -593,7 +593,7 @@ pf_remove_src_node(struct pf_src_node *s
        if (sn->rule.ptr != NULL) {
                sn->rule.ptr->src_nodes--;
                if (sn->rule.ptr->states_cur <= 0 &&
-                   sn->rule.ptr->max_src_nodes <= 0)
+                   sn->rule.ptr->src_nodes <= 0)
                        pf_rm_rule(NULL, sn->rule.ptr);
                RB_REMOVE(pf_src_tree, &tree_src_tracking, sn);
                pf_status.scounters[SCNT_SRC_NODE_REMOVALS]++;
Index: net/pf_ioctl.c
===================================================================
RCS file: /cvs/src/sys/net/pf_ioctl.c,v
retrieving revision 1.239
diff -u -p -r1.239 pf_ioctl.c
--- net/pf_ioctl.c      19 Apr 2011 21:58:03 -0000      1.239
+++ net/pf_ioctl.c      15 May 2011 23:03:42 -0000
@@ -263,7 +263,7 @@ void
 pf_rm_rule(struct pf_rulequeue *rulequeue, struct pf_rule *rule)
 {
        if (rulequeue != NULL) {
-               if (rule->states_cur <= 0) {
+               if (rule->states_cur <= 0 && rule->src_nodes <= 0) {
                        /*
                         * XXX - we need to remove the table *before* detaching
                         * the rule to make sure the table code does not delete

Reply via email to