> I had to disable monitoring of the internal interfaces of both remote > firewalls, as it killed the VPN when you ping'ed the backup firewall. The > packets get there, but the reply is sent back directly from the backup and > not via the master. > > To fix that I added a NAT rule, and could then monitor and connect to the > internal interfaces of both remote firewalls again.. > (These pf.conf examples and files below are from our remote office > firewalls. carp0 = external, carp1 = internal); > match out on $if_lan from { $hq_lan } to ($if_lan:network) nat-to (carp1) > > pass in quick on enc0 proto ipencap from { $ext_ip_hqfw } to { (carp0) } > keep state (if-bound) > pass in quick on enc0 from { $hq_lan } to { $if_lan:network } keep state > (if-bound) > > pass quick on $if_lan from { $hq_lan, (carp1) } to { $if_lan:network } queue > (_wan_vpn,_wan_pri) set prio (2,5)
I currently don't have any queuing on this link, but my rules look pretty close to the same here. tcpdump -nei pflog0 doesn't show any blocks for my traffic with 'block in log' set, so I don't think PF is getting in the way. Though, as you mention above, if the tunnel drops *because* I am hitting this internal address, this would be a problem. > PS; Also don't forget to restrict the MTU of VPN traffic so it doesn't > fragment (needed on both sides naturally); > match in on $if_lan proto { tcp, udp, icmp } from { $if_lan:network } to { > $hq_lan } scrub (no-df max-mss 1400) > set skip on $if_pfsync I have set this, though I don't really know how to verify if its working like expected. If I tcpdump the internal carp interface, and ping through the tunnel and to a device on the other side, I see the packets traverse the link. If I increase the ping size (-s 1500) I see fragmentation, but that doesn't really tell me that the rule is working, does it? Maybe I don't understand what that is supposed to be doing. > >>I also submitted some suggested modifications to /etc/rc.d/sasyncd and > >>/etc/rc.d/isakmpd here in the past which makes the setup and failover of > >>VPNs much faster and more stable. > > > >I did see those scripts, though they seem to be more solving the startup > >time of the daemons. My issue is more keeping the service up than start > >time. > > Yea they sort the startup, shutdown and also ensure a prompt the failover. I > wrote them during 5.2 so may not be so important but they add a level of > failsafe otherwise. > Keeping the tunnel up should simply be a case of making sure the backup > *never* sends encaped packets itself.. Oh, I missed this last read through. What does this do? Why? I assume this is the reason for the route on the firewall machine to use the local internal carp for the remote network. I understand the reason to route the packet to the internal carp interface on the backup, but on the primary I am unsure. So where 172.16.32.1 is the internal carp, I have a route on both the master and the backup. route add 172.16.0.0/24 172.16.32.1 Assuming that 172.16.0.0/24 is the remote network. > >It sounds like your setup is similar to my own. You don't see theses > >kinds of instability using sasyncd? If you have a look at my OP, the > >sasyncd.conf is in there. Its possible I have a configuration error, > >but just reading over the manpage again, I don't know what it would be. > > > >This is really troubling me. > > > > No, none at all. Our tunnels are *really* stable. I can reboot a firewall > and the tunnel only stops for a few seconds before switching over > gracefully. I really want to be able to say the same for these. > /etc/sasyncd.conf > peer 192.168.30.253 <- The other IP on the PFSYNC interface (cable directly > connected between firewalls) > interface carp0 > group carp > listen on 192.168.30.252 inet port 500 <- This PFSYNC IP etc.. > sharedkey 0x<long-hash> > flushmode startup > control isakmpd I now have exactly this, except where the options here are specified as default in the manpage. I left off control for example, as isakmpd is default, as is group carp. > /etc/isakmpd.conf > [general] > listen-on=<physical IP>,<The CARP0 IP> I've added this for my install and verified that ports are listening only on those addresses. > /etc/ipsec.conf > # Macros > local_gw="<local-carp0-ip>" > local_net="<local-LAN-network>" > remote_gw="<remote-carp0-ip>" > remote_net="<remote-LAN-network>" > > ike dynamic esp from $local_net to $remote_net \ > local $local_gw peer $remote_gw \ > main auth hmac-sha2-256 enc aes group modp1024 \ > quick auth hmac-sha2-256 enc aes group modp1024 \ > srcid $local_gw dstid $remote_gw \ > psk <a loooonnng PSK> The only thing I was missing here was the srcid and dstid, but that didn't seem to make a difference. So now I have the sasyncd and pfsync both going over a directly connected link, and sasyncd is only listening on that interface address, as well as 'set skip' on the interface in pf. All seems like what you have. Just for troubleshooting, I've only added the sasyncd to one side, since without HA is stable, I'd like to introduce one set of change at a time for testing. So setting sasyncd_flags="NO", restart isakmpd, things are up for hours and hours without dropping packets at all. Set sasyncd_flags="-vv", restart isakmpd, and Now I have stability that looks like this: Request timeout for icmp_seq 671 Request timeout for icmp_seq 672 Request timeout for icmp_seq 673 64 bytes from 172.16.32.1: icmp_seq=674 ttl=253 time=26.958 ms 64 bytes from 172.16.32.1: icmp_seq=675 ttl=253 time=26.386 ms 64 bytes from 172.16.32.1: icmp_seq=676 ttl=253 time=26.102 ms 64 bytes from 172.16.32.1: icmp_seq=677 ttl=253 time=26.104 ms Request timeout for icmp_seq 678 Request timeout for icmp_seq 679 Request timeout for icmp_seq 680 Request timeout for icmp_seq 681 Request timeout for icmp_seq 682 Request timeout for icmp_seq 683 Request timeout for icmp_seq 684 Request timeout for icmp_seq 685 Request timeout for icmp_seq 686 Request timeout for icmp_seq 687 Request timeout for icmp_seq 688 Request timeout for icmp_seq 689 Request timeout for icmp_seq 690 Request timeout for icmp_seq 691 Request timeout for icmp_seq 692 Request timeout for icmp_seq 693 Request timeout for icmp_seq 694 Request timeout for icmp_seq 695 Request timeout for icmp_seq 696 Request timeout for icmp_seq 697 Request timeout for icmp_seq 698 Request timeout for icmp_seq 699 64 bytes from 172.16.32.1: icmp_seq=700 ttl=253 time=26.339 ms 64 bytes from 172.16.32.1: icmp_seq=701 ttl=253 time=26.620 ms This happens pretty frequently. The IP above is the remote internal carp address. And this is the same address I hit when sasyncd is disabled. Its down for even longer periods of time on occasion, with nothing in the logs. I might need to start preparing some OpenVPN this weekend. Thanks again for looking at this. Any more insight would be much appreciated. I don't understand why my config seems to look like yours, as well as other docs I've read, and yet I have these kinds of strange issues, but only with sasyncd enabled. -- Zach