On 11/12/2016 17:22, chris g wrote:
Hello, I've decided to write here, as we had no luck troubleshooting PF's poor performance on 1GE interface. Network scheme, given as simplest as possible is: ISP <-> BGP ROUTER <-> PF ROUTER with many rdr rules <-> LAN Problem is reproducible on any PF ROUTER's connection - to LAN and to BGP ROUTER Both BGP and PF routers' OS versions and tunables, hardware: Hardware: E3-1230 V2 with HT on, 8GB RAM, ASUS P8B-E, NICs: Intel I350 on PCIe FreeBSD versions tested: 9.2-RELEASE amd64 with Custom kernel, 10.3-STABLE(compiled 4th Dec 2016) amd64 with Generic kernel. Basic tunables (for 9.2-RELEASE): net.inet.ip.forwarding=1 net.inet.ip.fastforwarding=1 kern.ipc.somaxconn=65535 net.inet.tcp.sendspace=65536 net.inet.tcp.recvspace=65536 net.inet.udp.recvspace=65536 kern.random.sys.harvest.ethernet=0 kern.random.sys.harvest.point_to_point=0 kern.random.sys.harvest.interrupt=0 kern.polling.idle_poll=1 BGP router doesn't have any firewall. PF options of PF router are: set state-policy floating set limit { states 2048000, frags 2000, src-nodes 384000 } set optimization normal Problem description: We are experiencing low throughput when PF is enabled with all the rdr's. If 'skip' is set on benchmarked interface or the rdr rules are commented (not present) - the bandwidth is flawless. In PF, there is no scrubbing done, most of roughly 2500 rdr rules look like this, please note that no interface is specified and it's intentional: rdr pass inet proto tcp from any to 1.2.3.4 port 1235 -> 192.168.0.100 port 1235 All measurements were taken using iperf 2.0.5 with options "-c <IP>" or "-c <IP> -m -t 60 -P 8" on client side and "-s" on server side. We changed directions too. Please note that this is a production environment and there was some other traffic on bencharked interfaces (let's say 20-100Mbps) during both tests, thus iperf won't show full Gigabit. There is no networking eqipment between 'client' and 'server' - just 2 NICs independly connected with Cat6 cable. Without further ado, here are benchmark results: server's PF enabled with fw rules but without rdr rules: root@client:~ # iperf -c server ------------------------------------------------------------ Client connecting to server, TCP port 5001 TCP window size: 65.0 KByte (default) ------------------------------------------------------------ [ 3] local clients_ip port 51361 connected with server port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0-10.0 sec 1.09 GBytes 936 Mbits/sec server's PF enabled with fw rules and around 2500 redirects present: root@client:~ # iperf -c seerver ------------------------------------------------------------ Client connecting to server, TCP port 5001 TCP window size: 65.0 KByte (default) ------------------------------------------------------------ [ 3] local clients_ip port 45671 connected with server port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0-10.0 sec 402 MBytes 337 Mbits/sec That much of a difference is 100% reproducible on production env. Performance depends on hours of day&night, the result is 160-400Mbps with RDR rules present and always above 900Mbps with RDR rules disabled. Some additional information: # pfctl -s info Status: Enabled for 267 days 10:25:22 Debug: Urgent State Table Total Rate current entries 132810 searches 5863318875 253.8/s inserts 140051669 6.1/s removals 139918859 6.1/s Counters match 1777051606 76.9/s bad-offset 0 0.0/s fragment 191 0.0/s short 518 0.0/s normalize 0 0.0/s memory 0 0.0/s bad-timestamp 0 0.0/s congestion 0 0.0/s ip-option 4383 0.0/s proto-cksum 0 0.0/s state-mismatch 52574 0.0/s state-insert 172 0.0/s state-limit 0 0.0/s src-limit 0 0.0/s synproxy 0 0.0/s # pfctl -s states | wc -l 113705 # pfctl -s memory states hard limit 2048000 src-nodes hard limit 384000 frags hard limit 2000 tables hard limit 1000 table-entries hard limit 200000 # pfctl -s Interfaces|wc -l 75 # pfctl -s rules | wc -l 1226 In our opinion hardware is not too weak as we have only 10-30% of CPU usage and during the benchmark it doesn't go to 100%. Even any one vcore isn't filled up to 100% of CPU usage. I would be really grateful if someone could point me at the right direction.
PF uses a linear search (with some optimizations to skip over rules which can't match) to establish new flows. If your PF config is really that simple give IPFW a try. While PF has a lot nicer syntax IPFW supports more powerful tables. IPFW tables are key value maps and the value can be used as argument to most actions. It may reduce your 2500 lookups to one table lookup. If you can afford to loose the source IP and port you could use a userspace TCP proxy.
_______________________________________________ freebsd-pf@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-pf To unsubscribe, send any mail to "freebsd-pf-unsubscr...@freebsd.org"