Good day to everyone, I'm a happy PF user, and have been for over a decade now. I'm writing to ask some questions about performance now that I've got a system that needs to handle some real traffic. I've been digging up various tweaks and settings from the archives (and elsewhere) over the years, and I'd like to know which of it is still useful and accurate, and which is "folklore". Sorry for the length of the post, but I hope that at the very least this thread will collect some information where the searchbots can find it...
I've got a pair of 3GHz Celeron machines in a failover config. Each machine has 1GB RAM and 4 gigabit intel (em) interfaces. One LAN, one WAN, one pfsync, and one unused. They're running 4.3 generic uniprocessor. I intentionally went with a high clock single-core box because PF isn't multi-core capable. The systems work great, but are chewing up about 60% of their time on interrupts (~9000 according to vmstat, with ~7500 going to the LAN/WAN cards). This is fine; everything is working and I know that high interrupt load was inevitable at the time. However, I need to ramp up the traffic on this system soon (we're at 30Mbps / 3.5kpps right now), so I want to make sure I can keep the load under control. I know that the first thing I should do is upgrade to 4.6, which I plan to do. However, I'm looking for other "best practices", which I've divided into two major sections below: Interrupt Mitigation: ===================== Since the system is under moderately heavy interrupt load, I'd like to try and improve that if possible since it seems that's going to be the first limit I hit on this system. In the "Tuning OpenBSD" paper: http://www.openbsd.org/papers/tuning-openbsd.ps they mention "sharing interrupts" on a high load system. If I understand correctly, the theory is that if all my NICs are on the same interrupt, the kernel can stay in the interrupt handler (no context switch) and service all the NICs at once, rather than handling each separately. Am I understanding this right? Should I try to lump all (or some) of my NICs onto the same IRQ? Or are there better approaches (see below). Several sources have suggested using APIC, which should be available in non-ancient hardware. I'm not sure if APIC replaces or complements the suggestion above about interrupt sharing. I checked my box, and my dmesg didn't mention APIC, so I don't think I'm taking advantage of it right now. The -misc archives have oblique references to APIC only being enabled on multiprocessor (MP) kernels rather than uniprocessor (UP) ones. Is this still true? I also saw hints that 4.6 now has APIC on in UP by default. Can anyone confirm or deny? Since PF isn't multi-core capable, I believed that UP was the way to go for firewalls (and my machine isn't multicore anyway). However, I'm happy to run MP if there are side benefits like APIC that would increase performance. Next up, FreeBSD has been touting support for message-signaled interrupts (MSI/MSI-X), claiming that this increases performance: http://onlamp.com/pub/a/bsd/2008/02/26/whats-new-in-freebsd-70.html?page=4 I'm not quite clear on whether this helps with a packet-forwarding workload or not. Is there support for this in OpenBSD, or would it not really help anyway? Sysctl Tweaks: ============== I've been getting errors like: WARNING: mclpool limit reached; increase kern.maxclusters So I did what it asked (I doubled the value to 12288), but am still getting the error. I've heard of people increasing this much more (20x the default!), but also taunts of insanity for doing so: http://monkey.org/openbsd/archive/misc/0407/msg01521.html So, what is a sane value for this? Are there other causes that need to be investigated when you get an "mclpool" warning, or should you just keep cranking up the value? Also, is there harm in going to high (besides wasting memory)? Next, I've seen interface drops (ifq.drops != 0), so I've cranked up ifq.maxlen to 256 * #nics (1024) per recommendations on -misc. I was still getting occasional drops, so I doubled to 2048, and am holding steady there. I've seen recommendations not to go beyond 2500; what should I be worried about in this case? High latency? Memory issues? Do I really need to be worried about a few drops? Finally, as was mentioned on the list a few days ago, increasing recvspace/sendspace doesn't help with a firewall (except for locally-sourced connections) because it's just forwarding packets. Just so I'm totally clear, is this true even in the case of packet reassembly (scrub) and randomization, or do those features cause the firewall to terminate and re-initiate connections that would benefit from the buffers? For that matter, are there any protocol options that help performance of a packet forwarding box (again, ignoring locally-sourced connections)? I'm thinking about buffers, default MSS, ECN, window scaling, SACK, etc. I know it doesn't hurt to turn them on, but am I doing any good for the connections I'm forwarding? Thanks for any input and advice you can provide; I'm looking forward to using PF for another 10 years... =) Jason -- Jason Healy | jhe...@logn.net | http://www.logn.net/