Hi Guys, I hope I am posting on the right mailing list. I am sending you this email because I have been experiencing a lot of "BAD State" in pf recently.
I don't know if this has been discussed previously. More and and more people are now using Oses that can adapt the TCP Windows Size. In pf, I could see that pf checks for the sequence number to make sure it is in the expected range. Therefore, pf will make the following check: Sequence number + tcpwindow size = Maximum expected sequence number. This check was fin when there were on "on the fly" tcp window change. Now, on very low latency network (few ms), we might experience a race condition where pf will not see the packet in the right order, therefore, pf will see packets coming in with a new tcp window size, but will not see the first modified packet on time. Therefore, it will produce a "Bad State" in the logs. To correct this, I had to remove in pf this check. From now on, I don't have any problem anymore. I think we should work to find a correct alternative solution for this. More and More oses adapt there Window size, startng with Windows Vista, Linux (from 2.6.18 I think), Mac OSX Leopard. I am also seeing a strange behavior while running backups. The backup will run for about a Gig, then I will have bad stated and the following error: Dec 5 08:34:24 pf01a-std /bsd: pf: BAD state: TCP 193.189.125.226:9103 193.189.125.226:9103 77.72.89.171:1900 [lo=1110166540 high=1110165037 win=65535 modulator=0] [lo=3660513330 high=3660578711 win=32767 modulator=0] 4:4 A seq=1110132270 (1110132270) ack=3660513330 len=1456 ackskew=0 pkts=127312:59301 dir=in,fwd Dec 5 08:34:24 pf01a-std /bsd: pf: State failure on: 2 | You could notice that the lo=1110166540 is higher than high=1110165037 and of course the Sequence Number is outbound: seq=1110132270 Any idea what could cause such a mess ? I am using OpenBSD 4.1, custom built kernel just to comment on check in pf. Lio