On Tue, Dec 15, 2009 at 7:21 AM, Linda Messerschmidt < linda.messerschm...@gmail.com> wrote:
> Hi all, > > I have a PF machine that is giving fits. I see a lot of weird behavior. > > 1) TCP connections (mainly port 80) sometimes take 3 seconds to get > started instead of being virtually instant. > 2) Sometimes HTTP connections just stop responding. (Client program > times out waiting for response.) > 3) Sometimes connections get weirdly dropped ("Connection reset by peer.") > 4) Sometimes if I am ssh'd through the firewall, something will happen > and my inbound packets will start getting dropped, but outbound > packets still pass. For example, if I'm at the shell prompt, it is > non-responsive. But if I log alongside a stuck connection and "write" > to that tty, I will see it no problem. > 5) States that have no right to still be there continue to pile up > into the hundreds of thousands. > > I kind of get the feeling that all of these are related. In > particular, I think 2, 3, and 4. > > Of all of these, the only one I can document at the moment is #3. > > Here is a packet capture from the public (web client) interface: > > 20:00:02.038067 IP 1.2.3.4.61645 > 5.6.7.8.80: S > 620577087:620577087(0) win 65535 <mss 1460,nop,wscale > 9,sackOK,timestamp 953726452 0> > 20:00:02.038328 IP 5.6.7.8.80 > 1.2.3.4.61645: S 40565958:40565958(0) > ack 620577088 win 0 <mss 1460> > 20:00:02.065678 IP 1.2.3.4.61645 > 5.6.7.8.80: . ack 1 win 65535 > 20:00:02.095158 IP 1.2.3.4.61645 > 5.6.7.8.80: P 1:80(79) ack 1 win 65535 > 20:00:02.378248 IP 1.2.3.4.61645 > 5.6.7.8.80: P 1:80(79) ack 1 win 65535 > 20:00:02.746163 IP 1.2.3.4.61645 > 5.6.7.8.80: P 1:80(79) ack 1 win 65535 > 20:00:03.282122 IP 1.2.3.4.61645 > 5.6.7.8.80: P 1:80(79) ack 1 win 65535 > 20:00:04.154112 IP 1.2.3.4.61645 > 5.6.7.8.80: P 1:80(79) ack 1 win 65535 > 20:00:05.698002 IP 1.2.3.4.61645 > 5.6.7.8.80: P 1:80(79) ack 1 win 65535 > 20:00:07.913721 IP 1.2.3.4.61645 > 5.6.7.8.80: P 1:80(79) ack 1 win 65535 > 20:00:12.145438 IP 1.2.3.4.61645 > 5.6.7.8.80: P 1:80(79) ack 1 win 65535 > 20:00:12.287038 IP 5.6.7.8.80 > 1.2.3.4.61645: F 1:1(0) ack 1 win 65535 > 20:00:20.408734 IP 1.2.3.4.61645 > 5.6.7.8.80: P 1:80(79) ack 1 win 65535 > 20:00:20.409874 IP 5.6.7.8.80 > 1.2.3.4.61645: R 40565959:40565959(0) win 0 > > Here is a packet capture of the same session from the private (web > server) interface: > > 20:00:02.038089 IP 1.2.3.4.61645 > 5.6.7.8.80: S > 620577087:620577087(0) win 65535 <mss 1460,nop,wscale > 9,sackOK,timestamp 953726452 0> > 20:00:02.038311 IP 5.6.7.8.80 > 1.2.3.4.61645: S 40565958:40565958(0) > ack 620577088 win 0 <mss 1460> > 20:00:02.065694 IP 1.2.3.4.61645 > 5.6.7.8.80: . ack 1 win 65535 > 20:00:12.287026 IP 5.6.7.8.80 > 1.2.3.4.61645: F 1:1(0) ack 1 win 65535 > 20:00:20.408747 IP 1.2.3.4.61645 > 5.6.7.8.80: P 1:80(79) ack 1 win 65535 > 20:00:20.409859 IP 5.6.7.8.80 > 1.2.3.4.61645: R 40565959:40565959(0) win 0 > > So that client -> server push packet is not making it through the > firewall despite numerous retransmits, until 18 seconds later when the > server has already given up on it. > > That connection hangs around in the state table for a long time as: > > all tcp 5.6.7.8:80 <- 1.2.3.4:61645 CLOSED:CLOSING > > This despite: > > set timeout tcp.closed 5 > set timeout tcp.closing 30 > > To test, I stopped connections from 1.2.3.4 to 5.6.7.8. At present, > there are *zero* established connections between 1.2.3.4 and 5.6.7.8. > None. But: > > $ sudo pfctl -s state | fgrep 1.2.3.4: | fgrep :80 | wc > 2243 13458 160932 > > A few minutes later I broke this down by connection status: > 1222 CLOSED:CLOSING > 556 ESTABLISHED:ESTABLISHED > 15 FIN_WAIT_2:CLOSING > 27 SYN_SENT:FIN_WAIT_2 > > That doesn't add up to 2243, so they *are* slowly dying off. I did > some poking around, and the CLOSED:CLOSING ones expire after fifteen > minutes, which is the timeout for tcp.opening. Um, OK. > > The 556 ESTABLISHED:ESTABLISHED states appear content to persist until > they age off too, even though those connections are long gone. > > As far as the "3 second" thing, I noticed somebody here recently had a > similar problem and made it go away by upping their states and > dropping their timeouts. Well, he dropped his timeouts to where ours > are, and we're at: > > set limit states 2000000 > > We are definitely not out of states; we're seeing these problems right > now and due to my playing around with the tcp.established timeout, > we're at 66412 states right now. (Ordinarily it hovers around > 350,000.) The machine is a dual-core Core 2 6320 with 2GB of RAM and > nothing to but load balance this traffic. It shows as 95% idle all > day. > > So sometimes pf loses packets related to connections that are still > around, and sometimes it thinks connections are still around long > after the packets are gone. > > I would be really, really grateful for any suggestions or help. I am > completely lost here and at my wits' end! > > I've included my pf.conf below. > > > > -------------------------------------------------------------------------------------------- > > set limit states 2000000 > set timeout tcp.established 86400 > set timeout tcp.closed 5 > set timeout tcp.closing 30 > > ExtIf = "em0" > IntIf = "em1" > > table <NoRouteIPs> { 127.0.0.0/8, 169.254.0.0/16, 192.0.2.0/24, > 192.168.0.0/16, 172.16.0.0/12, 10.0.0.0/8 } > table <OurIPs> { ... } > table <DNSServers> { ... } > table <BalanceBlocks> { ... } > > scrub > > ## Block Reserved Addresses > block log quick on $ExtIf from <NoRouteIPs> to any > block log quick on $ExtIf from any to <NoRouteIPs> > > ## Block our own Addresses > block in log quick on $ExtIf inet from <OurIPs> to any > > ## Anti-DDOS > table <AntiDDOS> persist > block quick from <AntiDDOS> to any > block quick from any to <AntiDDOS> > > ## Block HTTP traffic to DNS servers > block quick inet proto tcp from any to <DNSServers> port 80 > > ## Weird DNS people added 2009-06-18 > block drop log quick proto 255 > table <GTExperimentDNS> { 61.220.4.0/24 } > block drop in quick proto { udp, tcp } from <GTExperimentDNS> to any port > 53 > > ## Load Balancing > pass in on $ExtIf route-to { ($IntIf 3.4.5.6), ($IntIf 3.4.5.7), > ($IntIf 3.4.5.8), ($IntIf 3.4.5.9) } round-robin proto tcp from any to > <BalanceBlocks> port 80 > > Try enabling sticky connections here. -- Ermal _______________________________________________ freebsd-pf@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-pf To unsubscribe, send any mail to "freebsd-pf-unsubscr...@freebsd.org"