On 25 August 2015 at 17:55, Kolontai Andrej < andrej.kolon...@verwaltung.uni-muenchen.de> wrote:
> Hello, > > I'm new to this list and I hope it's the right place to ask. > > We have highly utilized installation of two FreeBSD-machines running > 10.1-RELEASE, pf and carp. There are about 50 networks (some via vlan, some > ipsec) connected to them, usually about 50000 pf states, about 1500 rules > and traffic sometimes hitting 1 gbit/s or more. > > When we load the ruleset on the active machine it sometimes freezes. It > occurs more often as our installation grows. Sometimes it freezes > completely, which is actually good as the backup machine takes over. But at > other times it's still some kind of alive but unbearably slow (ping is > about 40 s instead of the usual 0.2 ms), the backup machine does not take > over and people start to get angry. > The crashed machine usually recovers after a couple of minutes. > > We've been trying to resolve this for some months now and are running out > of ideas. We do believe that something is wrong with or setup but have no > clue what it could be. > > When it happens, the syslogs show tons of messages like these: > > Aug 24 09:08:39 applej kernel: pf_map_addr: src tracking maps > 172.23.101.87 to 141.84.149.18 > Aug 24 09:08:39 applej kernel: pf_map_addr: src tracking maps > 172.23.101.87 to 141.84.149.18 > Aug 24 09:08:39 applej kernel: pf_map_addr: src tracking maps > 172.23.107.206 to 141.84.149.18 > Aug 24 09:08:39 applej kernel: pf_map_addr: src tracking maps > 172.23.101.87 to 141.84.149.18 > Aug 24 09:08:39 applej kernel: pf_map_addr: src tracking maps > 172.23.107.206 to 141.84.149.18 > Aug 24 09:08:39 applej kernel: pf_map_addr: src tracking maps > 172.23.101.87 to 141.84.149.18 > Aug 24 09:08:39 applej kernel: pf_map_addr: src tracking maps > 172.23.106.171 to 141.84.149.18 > Aug 24 09:08:39 applej kernel: pf_map_addr: src tracking maps > 172.23.106.171 to 141.84.149.18 > Aug 24 09:08:39 applej kernel: pf_map_addr: src tracking maps > 172.23.106.72 to 141.84.149.18 > Aug 24 09:08:39 applej kernel: pf_map_addr: src tracking maps > 172.23.106.58 to 141.84.149.19 > Aug 24 09:08:39 applej last message repeated 2 times > Aug 24 09:08:39 applej kernel: pf: BAD state: TCP in wire: > 10.200.108.141:49202 172.23.2.92:8192 stack: - [lo=3482585550 > high=3482585552 win=8192 modulator=0pf_map_addr: src tracking maps > ]172.23.106.72 [lo=0 hi > gh=1 win=1 modulator=0 to ]141.84.149.18 2:0 > Aug 24 09:08:39 applej kernel: S seq=3503222902 (3503222902) ack=0 len=0 > ackskew=0 pkts=1:1 dir=in,fwd > Aug 24 09:08:39 applej kernel: pf: State failure on: 1 | 5 > Aug 24 09:08:39 applej kernel: pf_map_addr: src tracking maps > 172.23.106.171 to 141.84.149.18 > Aug 24 09:08:39 applej kernel: pf_map_addr: src tracking maps > 172.23.107.20 to 141.84.149.18 > Aug 24 09:08:39 applej kernel: pf_map_addr: src tracking maps > 172.23.101.87 to 141.84.149.18 > Aug 24 09:08:39 applej kernel: pf_map_addr: src tracking maps pf_map_addr: > src tracking maps pf_map_addr: src tracking maps > 172.23.106.171172.23.106.171 to 172.23.106.171141.84.149.18 to to > Aug 24 09:08:39 applej kernel: 141.84.149.18141.84.149.18 > Aug 24 09:08:39 applej kernel: > Aug 24 09:08:40 applej kernel: pf_map_addr: src tracking maps pf_map_addr: > src tracking maps 172.23.106.171172.23.101.87 to > Aug 24 09:08:40 applej kernel: to 141.84.149.18141.84.149.18 > Aug 24 09:08:40 applej kernel: > Aug 24 09:08:40 applej kernel: pf: OK ICMP 3:10 141.84.149.249 -> > 10.153.101.239 state: TCP in wire: 141.84.149.249:37677 10.153.101.239 > Aug 24 09:08:40 applej kernel: :80 stack: - [lo=4171141930 high=4171159337 > win=106 modulator=0 wscale=7] [lo=2653133771 high=2653147339 win=136 > modulator=0 wscale=7] 10:10 seq=2653133771 > Aug 24 09:08:40 applej kernel: pf: OK ICMP 3:10 141.84.149.249 -> > 10.153.101.239 state: TCP out wire: 10.153.101.239:80 141.84.149.249 > Aug 24 09:08:40 applej kernel: :37677 stack: - [lo=4171141930 > high=4171159337 win=106 modulator=0 wscale=7] [lo=2653133771 > high=2653147339 win=136 modulator=0 wscale=7] 10:10 seq=2653133771 > Aug 24 09:08:40 applej kernel: pf_map_addr: src tracking maps pf_map_addr: > src tracking maps 172.23.106.184pf: wire key attach failed on vlan42: > 172.23.106.165pf_map_addr: src tracking maps > Aug 24 09:08:40 applej kernel: 112 to in141.84.149.18 wire: 172.23.106.58 > Aug 24 09:08:40 applej kernel: 141.84.149.27 > Aug 24 09:08:40 applej kernel: to 224.0.0.18 to 1:0141.84.149.18, > existing: > Aug 24 09:08:40 applej kernel: 112141.84.149.19 in > Aug 24 09:08:40 applej kernel: wire: > Aug 24 09:08:40 applej kernel: 141.84.149.27 > Aug 24 09:08:40 applej kernel: 224.0.0.18 stack: 141.84.149.27 224.0.0.18 > 1:0 > Aug 24 09:08:40 applej kernel: pf: wire key attach failed on vlan42: 112 > in wire: 141.84.149.1 224.0.0.18 1:0, existing: > Aug 24 09:08:40 applej kernel: 112 in wire: 141.84.149.1 224.0.0.18 stack: > 141.84.149.1 224.0.0.18 1:0 > Aug 24 09:08:40 applej kernel: > Aug 24 09:08:40 applej kernel: pf_map_addr: src tracking maps pf_map_addr: > src tracking maps pf: wire key attach failed on vlan250: 172.23.100.108pf: > OK ICMP 3:10 112 > Aug 24 09:08:40 applej kernel: 172.23.106.184 in to wire: to > 10.153.101.73141.84.149.18 224.0.0.18141.84.149.18 1:0141.84.149.198, > existing: > Aug 24 09:08:40 applej kernel: 112 in > Aug 24 09:08:40 applej kernel: pf_map_addr: src tracking maps > Aug 24 09:08:40 applej kernel: > Aug 24 09:08:40 applej kernel: wire: -> 10.153.101.73172.23.106.58 to > 224.0.0.18141.84.149.19 stack: > Aug 24 09:08:40 applej kernel: 10.153.101.73 > Aug 24 09:08:40 applej kernel: 77.7.121.109 state: 224.0.0.18TCP 1:0 > Aug 24 09:08:40 applej kernel: out wire: 141.84.149.198:80 > 77.7.121.109:52521 stack: > Aug 24 09:08:40 applej kernel: - [lo=2750816253 high=2750829820 win=256 > modulator=0 wscale=8] [lo=2902946358 high=2903011894 win=106 modulator=0 > Aug 24 09:08:40 applej kernel: wscale=7] 7:9 seq=2750816252 > Aug 24 09:08:40 applej kernel: pf: OK ICMP 3:10 141.84.149.198 -> > 77.7.121.109 state: TCP in wire: 77.7.121.109:52521 > Aug 24 09:08:40 applej kernel: 141.84.149.198:80 stack: - [lo=2750816253 > high=2750829820 win=256 modulator=0 > Aug 24 09:08:40 applej kernel: wscale=8] [lo=2902946358 high=2903011894 > win=106 modulator=0 wscale=7] 7:9 seq=2750816252 > .... > Aug 24 09:09:11 applej kernel: 4:7:80:1433 seq=1025128833 > Aug 24 09:09:11 applej kernel: 141.84.149.19:47806 stack: > 10.110.32.129-:50143 [lo=3790813207 high=3790831127 win=14600 modulator=0 > stack: wscale=8-] [lo=3961092366 high=3962143087 win=4106 modulator=0 > [lo=155 > 1251521 high=1554989121 win=17898 modulator=0 wscale=8 wscale=8]] > [lo=1005669588 high=1006720647 win=4106 modulator=0 4:2 wscale=8 > seq=1551251520 > Aug 24 09:09:11 applej kernel: ] 4:4 seq=1005669587 > Aug 24 09:09:11 applej kernel: pf: OK ICMP 3:1 172.23.254.250pf: OK ICMP > 3:1 -> 10.110.32.220172.23.106.87 -> state: 10.110.32.193TCP state: > inTCP wire: out172.23.106.87 wire: :5017310.110.32.193 :1433141.84 > .149.21 :844310.110.32.130 stack: :54454172.23.106.87 stack: :50173- > [lo=2704530546 high=2705581267 win=4106 modulator=0141.84.149.18 > wscale=8:8443] [lo=521335749 high=521353924 win=256 modulator=0 [lo=19884644 > 01 high=1989515460 win=4106 modulator=0 wscale=8 wscale=8]] [lo=3203135938 > high=3203200962 win=71 modulator=0 4:4 wscale=8 seq=1988464400 > Aug 24 09:09:11 applej kernel: ] 7:4 seq=521335748 > Aug 24 09:09:11 applej kernel: pf: OK ICMP 3:1 pf: OK ICMP 3:1 pf: OK ICMP > 3:1 pf: OK ICMP 3:1 pf: OK ICMP 3:1 pf: OK ICMP 3:1 > 172.23.254.250172.23.254.25010.110.32.220 -> -> 172.23.106.8710.110.32.193 > state: > state: TCPTCP in out wire: wire: 172.23.106.8710.110.32.193:50168:1433 > 141.84.149.2110.110.32.129:8443:50175 stack: stack: 172.23.106.87-:50168 > [lo=1504649300 high=1505700021 win=4106 modulator=0 wscale=8141 > .84.149.18]:8443 [lo=723475428 high=724526487 win=4106 modulator=0 > [lo=975354116 high=975374851 win=256 modulator=0 wscale=8 wscale=8]] 4:4 > [lo=4251253613 high=4251317869 win=81 modulator=0 seq=723475427 > Aug 24 09:09:11 applej kernel: wscale=8] 7:4 seq=975354115 > Aug 24 09:09:11 applej kernel: 141.84.44.211 -> > 172.23.254.250217.86.168.236 -> state: 172.23.3.29TCP state: inTCP wire: > 141.84.44.211217.86.168.236 -> :5098310.180.10.209 state: > 141.84.149.198TCP:80 in stack > : wire: -10.180.10.209 [lo=3709359805 high=3709359806 win=8192 > modulator=0 in:49223] [lo=0 high=8192 win=1 modulator=0141.84.149.244]:443 > 2:0 stack: seq=3709359804 > Aug 24 09:09:11 applej kernel: - [lo=827930303 high=827944894 win=67 > modulator=0 wire: wscale=8172.23.3.29]:35196 [lo=1796383793 > high=1796400177 win=114 modulator=0 wscale=710.153.101.108]:6556 10:10 > stack: s > eq=827930303 > Aug 24 09:09:11 applej kernel: - [lo=1991477221 high=1991477222 win=14600 > modulator=0 -> ]172.23.101.172 [lo=0 high=14600 win=1 modulator=0 state: > ]TCP 2:0 in seq=1991477220 > Aug 24 09:09:11 applej kernel: wire: 172.23.101.172:49613 > 141.84.149.21:8080 stack: 172.23.101.172:49613 141.84.149.19:8080 > [lo=3559829126 high=3559829127 win=8192 modulator=0] [lo=0 high=8192 win=1 > modulator=0] > 2:0 seq=3559829125 > Aug 24 09:09:11 applej kernel: pf: OK ICMP 3:1 pf: OK ICMP 3:1 > 10.110.32.220141.84.44.211 -> pf_map_addr: src tracking maps -> > 172.23.101.8710.110.32.193 to 115.248.50.20141.84.149.19 state: > Aug 24 09:09:11 applej kernel: state: TCPTCP out in wire: wire: > 10.110.32.193115.248.50.20:1433:50426 10.110.32.130141.84.149.229:54455:80 > stack: stack: -- [lo=201188016 high=202238737 win=513 modulator=0 [lo > =2382456191 high=2382456193 win=2048 modulator=0 wscale=8]] [lo=0 high=1 > win=1 modulator=0 [lo=3935828538 high=3935959789 win=4106 modulator=0] > wscale=8 2:0] seq=2382456191 > Aug 24 09:09:11 applej kernel: 4:4 seq=3935828537 > > Sorry if this is too much but I really can' t tell which of these > messages could help. Perhaps there isn't any useful information at all. To > me it seems it's just a side effect of the actual problem which I am unable > to see. > > Maybe somebody has an idea about what's going wrong and what to > investigate. Or theories right now are: > * issues with pfsync (messages complaining about states, but unlikely, > it's normally working just fine) > * some lack of ressources (yet, which?) > * race condition inside the kernel (there are 24 CPU cores) > > We have not yet been able to reproduce it in a lab environment. > > Right now, we work around this by putting the master into the backup state > (using demotion) before loading the ruleset and changing back to master > state after the reload. This is working fine. But it kills, of course, the > IPSec-SAs. But that's something we can live with right now. > > > Viele Grüße > Andrej Kolontai > > Ludwig-Maximilians-Universitaet Muenchen > Ref. VI.4 (IT-Sicherheit & Verzeichnisdienste) > Martiusstrasse 4 / 207 > 80802 Muenchen > > phone +49 (0)89 2180-3815 > email mailto:andrej.kolon...@verwaltung.uni-muenchen.de > web http://www.uni-muenchen.de/zuv/it/ > > 1.5k rules seems like a lot for PF to handle. Is that 1.5k rules you've written in the conf, or 1.5k rules from `pfctl -sr | wc -l' ? If that's merely your number of configuration lines, that may very well expand to many more rules in reality. For example "pass in quick on { $vlan2 $vlan3 } inet proto { $tcp $udp } from <foo> to <pub_ns> port 53" will expand into 4 actual rules. I would suggest you find a way to drastically lower that. You may also wish to ensure : - you're using, if at all possible, a *dedicated* pfsync link (like a cable on a dedicated interface between the boxes) - you've got enough RAM to store your PF states As another, *unrelated* performance enhancement, make sure to use the "quick" keyword whenever applicable. _______________________________________________ freebsd-pf@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-pf To unsubscribe, send any mail to "freebsd-pf-unsubscr...@freebsd.org"