Should have mentioned it but the situation described below was 
with the 'defer' option of pfsync enabled.

I think you are right about the problems being with TCP sequence number 
checks. I tried the PF rule with 'keep state (sloppy)' and that "fixes" 
the problem (or I guess it would be better to say: "Makes the symptoms 
disappear"). It seems like a highly discouraged option and I don't 
fully understand the security implications. Would appreciate any 
insights anyone could offer on that.

Could it be that since my upstream has a strong preference
for FW1 everything goes fine for a while (like about 30 secs in mosts
tests) and then upstream sends a couple (maybe even a single one) 
packets directly to FW2 which botches the sequence number check on FW1?
Even with this theory I still don't understand why PF has no problem
accepting traffic on the outside interface (vlan1604) and only starts 
to have a problem when trying to send it out on the inside interface 
(vlan1003).

The cable is actually just a normal cable; Old habbits I guess... ;-)

> Op 20 oktober 2016 om 20:21 schreef Stuart Henderson <s...@spacehopper.org>:
> 
> For this config where you can't predict which firewall receives the
> packet from upstream, and especially if you end up with packets from
> your "inside" machine going through a different firewall as the one
> receiving external packets, you can run into problems with the TCP
> sequence number checking that PF (and some other stateful firewalls)
> does on TCP packets.
> 
> Try "ifconfig pfsync0 defer" first - from pfsync(4):
> 
>  Where more than one firewall might actively handle packets, e.g. with
>  certain ospfd(8), bgpd(8) or carp(4) configurations, it is beneficial to
>  defer transmission of the initial packet of a connection. The pfsync state
>  insert message is sent immediately; the packet is queued until either this
>  message is acknowledged by another system, or a timeout has expired. This
>  behaviour is enabled with the defer parameter to ifconfig(8).
> 
> On 2016-10-20, Jasper Siepkes <siep...@serviceplanet.nl> wrote:
> > Hi list!
> >
> > I've ran into a situation with PF which I don't quite understand. 
> >
> > The situation is as follows; I have 2 OpenBSD firewalls connected to an
> > upstream provider which forwards traffic to us via equal cost multi
> > path routing (ECMP). The firewalls are connected via a crossover cable
> 
> Incidentally, there's no need for crossover cables with gigabit nics.
> 
> > over wich pfsync is configured. On the inside the firewalls are each
> > connected with 2 cables (with LACP) to 2 different switches which 
> > are in an MLAG configuration (so these 2 switches function as 1 switch).
> > The OpenBSD firewalls are running OpenBSD 6.0 with all patches applied.
> >
> > It looks like this (public IP's changed):
> >
> > OUTSIDE / UPSTREAM 
> >
> > GW: 192.168.116.21 GW: 192.168.216.21
> > + ^
> > | |
> > vlan1604 | | vlan2604
> > 192.168.116.22 | | 192.168.216.22
> > | |
> > +---v---+ +----+--+
> > | FW 1 +------+ FW 2 |
> > +---+---+ +----+--+
> > vlan1003 | ^ vlan1003
> > 17.214.19.49 | | 17.214.19.50
> > +---------------+
> >
> > INSIDE
> >
> > Now on both firewalls I have this really simple ruleset:
> >
> > -------------------------
> > # cat /etc/pf.conf
> > 
> > set skip on lo0
> > # Interface connected with crossover cable to other firewall for
> > # pfsync.
> > set skip on em1
> >
> > block log
> >
> > pass log quick proto tcp to port 22
> > -------------------------
> >
> > Which results in the following PF rules:
> > -------------------------
> > # pfctl -sr
> > 
> > block drop log all
> > pass log quick proto tcp from any to any port = 22 flags S/SA
> > -------------------------
> >
> > Now when I SSH from the outside world to 17.214.19.50 the traffic flows
> > as indicated in the diagram (altough its ECMP upstream seems to prefer
> > FW 1 so traffic always ends up there): 
> >
> > [Internet] Me (62.187.45.178)
> > |
> > V
> > [FW1]vlan1604 
> > |
> > V
> > [FW1]vlan1003
> > |
> > V
> > [FW2]vlan1003 
> > |
> > V
> > [FW2]vlan2604 
> > |
> > V
> > [Internet] Me 
> >
> > And this works. However after about 30 seconds I lose connection to the
> > 17.214.19.50 host because PF can't match the traffic on FW1 vlan1003 
> > to the established state. I'm typing random stuff in to the SSH session
> > to keep it active and then it just hangs. This looks like this 
> > (public IP's changed):
> >
> > -------------------------
> > # tcpdump -nettti pflog0 port 22 and host 17.214.19.50 
> > tcpdump: WARNING: snaplen raised from 116 to 160
> > tcpdump: listening on pflog0, link-type PFLOG
> > Oct 20 10:30:11.299997 rule 1/(match) pass in on vlan1604:
> > 62.187.45.178.64072 >
> > 17.214.19.50.22: S 4112726507:4112726507(0) win 29200  6451222 0,nop,wscale
> > 7> (DF)
> > Oct 20 10:30:11.300026 rule 1/(match) pass out on vlan1003:
> > 62.187.45.178.64072
> >> 17.214.19.50.22: S 4112726507:4112726507(0) win 29200
> >>  1460,sackOK,timestamp 6451222 0,nop,wscale 7> (DF)
> >
> >
> >
> > Oct 20 10:30:44.330002 rule 0/(match) block out on vlan1003:
> > 62.187.45.178.64072
> >> 17.214.19.50.22: P 4112740387:4112740427(40) ack 2507834833 win 594
> ><nop,nop,timestamp 6484253 2782905123> (DF) [tos 0x10]
> > Oct 20 10:30:44.425886 rule 0/(match) block out on vlan1003:
> > 62.187.45.178.64072
> >> 17.214.19.50.22: P 40:80(40) ack 1 win 594 <nop,nop,timestamp 6484349
> > 2782905123> (DF) [tos 0x10]
> > Oct 20 10:30:44.436021 rule 0/(match) block out on vlan1003:
> > 62.187.45.178.64072
> >> 17.214.19.50.22: P 40:80(40) ack 1 win 594 <nop,nop,timestamp 6484359
> > 2782905123> (DF) [tos 0x10]
> > Oct 20 10:30:44.514107 rule 0/(match) block out on vlan1003:
> > 62.187.45.178.64072
> >> 17.214.19.50.22: P 80:120(40) ack 1 win 594 <nop,nop,timestamp 6484437
> > 2782905123> (DF) [tos 0x10]
> > Oct 20 10:30:44.618079 rule 0/(match) block out on vlan1003:
> > 62.187.45.178.64072
> >> 17.214.19.50.22: P 120:160(40) ack 1 win 594 <nop,nop,timestamp 6484541
> > 2782905123> (DF) [tos 0x10]
> > -------------------------
> >
> > It seems that PF all of a sudden doesn't see the SSH traffic as part
> > of the established connection anymore. The state table of PF show that 
> > the state was correctly added to the state table and synced between 
> > the firewalls and it also still there:
> >
> > -----------------------------------
> > # pfctl -ss
> > 
> > all carp 17.214.19.49 -> 17.214.19.50 SINGLE:NO_TRAFFIC
> > all carp 10.100.0.2 -> 10.100.0.3 SINGLE:NO_TRAFFIC
> > all carp 10.100.2.2 -> 10.100.2.3 SINGLE:NO_TRAFFIC
> > all tcp 17.214.19.49:22 <- 62.187.45.178:65149 ESTABLISHED:CLOSING
> > all tcp 17.214.19.49:22 <- 62.187.45.178:58883 ESTABLISHED:CLOSING
> > all tcp 17.214.19.49:22 <- 62.187.45.178:59505 ESTABLISHED:ESTABLISHED
> > all tcp 17.214.19.49:22 <- 62.187.45.178:63889 ESTABLISHED:FIN_WAIT_2
> > all tcp 17.214.19.49:22 <- 62.187.45.178:63963 ESTABLISHED:ESTABLISHED
> > all tcp 17.214.19.49:22 <- 62.187.45.178:63235 ESTABLISHED:ESTABLISHED
> > all tcp 17.214.19.50:22 <- 62.187.45.178:54705 FIN_WAIT_2:FIN_WAIT_2
> > all tcp 17.214.19.50:22 <- 62.187.45.178:64072 ESTABLISHED:ESTABLISHED
> > all tcp 17.214.19.50:22 <- 119.249.54.68:38527 TIME_WAIT:TIME_WAIT
> > all tcp 17.214.19.49:22 <- 221.194.47.224:60327 TIME_WAIT:TIME_WAIT
> > all tcp 17.214.19.50:22 <- 221.194.47.224:53897 TIME_WAIT:TIME_WAIT
> > -----------------------------------
> >
> > The relevant PF state here is (as indentified in the pflog tcpdump
> > as the SSH session that disconnected):
> >
> > all tcp 17.214.19.50:22 <- 62.187.45.178:64072 ESTABLISHED:ESTABLISHED
> >
> > which seems okay. 
> >
> > What I also find odd is that PF allows the packet to
> > traverse the vlan1604 (external) interface and then decides that it 
> > can't traverse the vlan1003 (internal) interface. Why isn't it a
> > problem for the vlan1604 interface? It should be noted that the 
> > vlan1003 interfaces sits on a trunk interface (trunk0, configured as 
> > LACP). I don't see how but this might be related.
> >
> > I'm at a loss here as I really can't explain the behavior I'm seeing
> > of PF here. Am I missing something? Could this be a bug?
> >
> > Regards,
> >
> > Jasper

Reply via email to