On 11/12/2016 17:22, chris g wrote:
Hello,

I've decided to write here, as we had no luck troubleshooting PF's
poor performance on 1GE interface.

Network scheme, given as simplest as possible is:

ISP <-> BGP ROUTER <-> PF ROUTER with many rdr rules <-> LAN

Problem is reproducible on any PF ROUTER's connection - to LAN and to BGP ROUTER


Both BGP and PF routers' OS versions and tunables, hardware:

Hardware: E3-1230 V2 with HT on, 8GB RAM, ASUS P8B-E, NICs: Intel I350 on PCIe

FreeBSD versions tested: 9.2-RELEASE amd64 with Custom kernel,
10.3-STABLE(compiled 4th Dec 2016) amd64 with Generic kernel.

Basic tunables (for 9.2-RELEASE):
net.inet.ip.forwarding=1
net.inet.ip.fastforwarding=1
kern.ipc.somaxconn=65535
net.inet.tcp.sendspace=65536
net.inet.tcp.recvspace=65536
net.inet.udp.recvspace=65536
kern.random.sys.harvest.ethernet=0
kern.random.sys.harvest.point_to_point=0
kern.random.sys.harvest.interrupt=0
kern.polling.idle_poll=1

BGP router doesn't have any firewall.

PF options of PF router are:
set state-policy floating
set limit { states 2048000, frags 2000, src-nodes 384000 }
set optimization normal


Problem description:
We are experiencing low throughput when PF is enabled with all the
rdr's. If 'skip' is set on benchmarked interface or the rdr rules are
commented (not present) - the bandwidth is flawless. In PF, there is
no scrubbing done, most of roughly 2500 rdr rules look like this,
please note that no interface is specified and it's intentional:

rdr pass inet proto tcp from any to 1.2.3.4 port 1235 -> 192.168.0.100 port 1235

All measurements were taken using iperf 2.0.5 with options "-c <IP>"
or "-c <IP> -m -t 60 -P 8" on client side and "-s" on server side. We
changed directions too.
Please note that this is a production environment and there was some
other traffic on bencharked interfaces (let's say 20-100Mbps) during
both tests, thus iperf won't show full Gigabit. There is no networking
eqipment between 'client' and 'server' - just 2 NICs independly
connected with Cat6 cable.

Without further ado, here are benchmark results:

server's PF enabled with fw rules but without rdr rules:
  root@client:~ # iperf -c server
------------------------------------------------------------
Client connecting to server, TCP port 5001
TCP window size: 65.0 KByte (default)
------------------------------------------------------------
[  3] local clients_ip port 51361 connected with server port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  1.09 GBytes   936 Mbits/sec



server's PF enabled with fw rules and around 2500 redirects present:
root@client:~ # iperf -c seerver
------------------------------------------------------------
Client connecting to server, TCP port 5001
TCP window size: 65.0 KByte (default)
------------------------------------------------------------
[  3] local clients_ip port 45671 connected with server port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec   402 MBytes   337 Mbits/sec


That much of a  difference is 100% reproducible on production env.

Performance depends on hours of day&night, the result is 160-400Mbps
with RDR rules present and always above 900Mbps with RDR rules
disabled.


Some additional information:

# pfctl -s info
Status: Enabled for 267 days 10:25:22         Debug: Urgent

State Table                          Total             Rate
  current entries                   132810
  searches                      5863318875          253.8/s
  inserts                        140051669            6.1/s
  removals                       139918859            6.1/s
Counters
  match                         1777051606           76.9/s
  bad-offset                             0            0.0/s
  fragment                             191            0.0/s
  short                                518            0.0/s
  normalize                              0            0.0/s
  memory                                 0            0.0/s
  bad-timestamp                          0            0.0/s
  congestion                             0            0.0/s
  ip-option                           4383            0.0/s
  proto-cksum                            0            0.0/s
  state-mismatch                     52574            0.0/s
  state-insert                         172            0.0/s
  state-limit                            0            0.0/s
  src-limit                              0            0.0/s
  synproxy                               0            0.0/s

# pfctl -s states | wc -l
  113705

# pfctl  -s memory
states        hard limit  2048000
src-nodes     hard limit   384000
frags         hard limit     2000
tables        hard limit     1000
table-entries hard limit   200000

# pfctl -s Interfaces|wc -l
      75

# pfctl -s rules | wc -l
    1226


In our opinion hardware is not too weak as we have only 10-30% of CPU
usage and during the benchmark it doesn't go to 100%. Even any one
vcore isn't filled up to 100% of CPU usage.


I would be really grateful if someone could point me at the right direction.

PF uses a linear search (with some optimizations to skip over rules which can't match) to establish new flows. If your PF config is really that simple give IPFW a try. While PF has a lot nicer syntax IPFW supports more powerful tables. IPFW tables are key value maps and the value can be used as argument to most actions. It may reduce your 2500 lookups to one table lookup. If you can afford to loose the source IP and port you could use a userspace TCP proxy.
_______________________________________________
freebsd-pf@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-pf
To unsubscribe, send any mail to "freebsd-pf-unsubscr...@freebsd.org"

Reply via email to