Hi Guys,

I hope I am posting on the right mailing list. I am sending you this email
because I have been experiencing a lot of "BAD State" in pf recently.

I don't know if this has been discussed previously.

More and and more people are now using Oses that can adapt the TCP Windows
Size. In pf, I could see that pf checks for the sequence number to make sure
it is in the expected range. Therefore, pf will make the following check:

Sequence number + tcpwindow size = Maximum expected sequence number.

This check was fin when there were on "on the fly" tcp window change. Now, on
very low latency network (few ms), we might experience a race condition where
pf will not see the packet in the right order, therefore, pf will see packets
coming in with a new tcp window size, but will not see the first modified
packet on time. Therefore, it will produce a "Bad State" in the logs.

To correct this, I had to remove in pf this check. From now on, I don't have
any problem anymore. I think we should work to find a correct alternative
solution for this. More and More oses adapt there Window size, startng with
Windows Vista, Linux (from 2.6.18 I think), Mac OSX Leopard.


I am also seeing a strange behavior while running backups. The backup will run
for about a Gig, then I will have bad stated and the following error:

Dec  5 08:34:24 pf01a-std /bsd: pf: BAD state: TCP 193.189.125.226:9103
193.189.125.226:9103 77.72.89.171:1900 [lo=1110166540 high=1110165037
win=65535 modulator=0] [lo=3660513330 high=3660578711 win=32767 modulator=0]
4:4 A seq=1110132270 (1110132270) ack=3660513330 len=1456 ackskew=0
pkts=127312:59301 dir=in,fwd
Dec  5 08:34:24 pf01a-std /bsd: pf: State failure on:   2     |

You could notice that the lo=1110166540 is higher than high=1110165037 and of
course the Sequence Number is outbound: seq=1110132270

Any idea what could cause such a mess ?

I am using OpenBSD 4.1, custom built kernel just to comment on check in pf.

Lio

Reply via email to