> I had to disable monitoring of the internal interfaces of both remote
> firewalls, as it killed the VPN when you ping'ed the backup firewall. The
> packets get there, but the reply is sent back directly from the backup and
> not via the master.
> 
> To fix that I added a NAT rule, and could then monitor and connect to the
> internal interfaces of both remote firewalls again..
> (These pf.conf examples and files below are from our remote office
> firewalls. carp0 = external, carp1 = internal);
> match out on $if_lan from { $hq_lan } to ($if_lan:network) nat-to (carp1)
> 
> pass in quick on enc0 proto ipencap from { $ext_ip_hqfw } to { (carp0) }
> keep state (if-bound)
> pass in quick on enc0 from { $hq_lan } to { $if_lan:network } keep state
> (if-bound)
> 
> pass quick on $if_lan from { $hq_lan, (carp1) } to { $if_lan:network } queue
> (_wan_vpn,_wan_pri) set prio (2,5)

I currently don't have any queuing on this link, but my rules look
pretty close to the same here.  tcpdump -nei pflog0 doesn't show any
blocks for my traffic with 'block in log' set, so I don't think PF is
getting in the way.

Though, as you mention above, if the tunnel drops *because* I am hitting
this internal address, this would be a problem.

> PS; Also don't forget to restrict the MTU of VPN traffic so it doesn't
> fragment (needed on both sides naturally);
> match in on $if_lan proto { tcp, udp, icmp } from { $if_lan:network } to {
> $hq_lan } scrub (no-df max-mss 1400)
> set skip on $if_pfsync

I have set this, though I don't really know how to verify if its working
like expected.  If I tcpdump the internal carp interface, and ping
through the tunnel and to a device on the other side, I see the packets
traverse the link.  If I increase the ping size (-s 1500) I see
fragmentation, but that doesn't really tell me that the rule is working,
does it?  Maybe I don't understand what that is supposed to be doing.

> >>I also submitted some suggested modifications to /etc/rc.d/sasyncd and 
> >>/etc/rc.d/isakmpd here in the past which makes the setup and failover of 
> >>VPNs much faster and more stable.
> >
> >I did see those scripts, though they seem to be more solving the startup
> >time of the daemons.  My issue is more keeping the service up than start
> >time.
> 
> Yea they sort the startup, shutdown and also ensure a prompt the failover. I
> wrote them during 5.2 so may not be so important but they add a level of
> failsafe otherwise.
> Keeping the tunnel up should simply be a case of making sure the backup
> *never* sends encaped packets itself..

Oh, I missed this last read through.  What does this do?  Why?  I assume
this is the reason for the route on the firewall machine to use the
local internal carp for the remote network.

I understand the reason to route the packet to the internal carp
interface on the backup, but on the primary I am unsure.  So where
172.16.32.1 is the internal carp, I have a route on both the master and
the backup.

route add 172.16.0.0/24 172.16.32.1

Assuming that 172.16.0.0/24 is the remote network.

> >It sounds like your setup is similar to my own.  You don't see theses
> >kinds of instability using sasyncd?  If you have a look at my OP, the
> >sasyncd.conf is in there.  Its possible I have a configuration error,
> >but just reading over the manpage again, I don't know what it would be.
> >
> >This is really troubling me.
> >
> 
> No, none at all. Our tunnels are *really* stable. I can reboot a firewall
> and the tunnel only stops for a few seconds before switching over
> gracefully.

I really want to be able to say the same for these.

> /etc/sasyncd.conf
> peer 192.168.30.253 <- The other IP on the PFSYNC interface (cable directly
> connected between firewalls)
> interface carp0
> group carp
> listen on 192.168.30.252 inet port 500 <- This PFSYNC IP etc..
> sharedkey 0x<long-hash>
> flushmode startup
> control isakmpd

I now have exactly this, except where the options here are specified as
default in the manpage.  I left off control for example, as isakmpd is
default, as is group carp.

> /etc/isakmpd.conf
> [general]
> listen-on=<physical IP>,<The CARP0 IP>

I've added this for my install and verified that ports are listening
only on those addresses.

> /etc/ipsec.conf
> # Macros
> local_gw="<local-carp0-ip>"
> local_net="<local-LAN-network>"
> remote_gw="<remote-carp0-ip>"
> remote_net="<remote-LAN-network>"
> 
> ike dynamic esp from $local_net to $remote_net \
> local $local_gw peer $remote_gw \
> main auth hmac-sha2-256 enc aes group modp1024 \
> quick auth hmac-sha2-256 enc aes group modp1024 \
> srcid $local_gw dstid $remote_gw \
> psk <a loooonnng PSK>

The only thing I was missing here was the srcid and dstid, but that
didn't seem to make a difference.

So now I have the sasyncd and pfsync both going over a directly
connected link, and sasyncd is only listening on that interface address,
as well as 'set skip' on the interface in pf.  All seems like what you
have.

Just for troubleshooting, I've only added the sasyncd to one side, since
without HA is stable, I'd like to introduce one set of change at a time
for testing.

So setting sasyncd_flags="NO", restart isakmpd, things are up for hours
and hours without dropping packets at all.  Set sasyncd_flags="-vv",
restart isakmpd, and Now I have stability that looks like this:

Request timeout for icmp_seq 671
Request timeout for icmp_seq 672
Request timeout for icmp_seq 673
64 bytes from 172.16.32.1: icmp_seq=674 ttl=253 time=26.958 ms
64 bytes from 172.16.32.1: icmp_seq=675 ttl=253 time=26.386 ms
64 bytes from 172.16.32.1: icmp_seq=676 ttl=253 time=26.102 ms
64 bytes from 172.16.32.1: icmp_seq=677 ttl=253 time=26.104 ms
Request timeout for icmp_seq 678
Request timeout for icmp_seq 679
Request timeout for icmp_seq 680
Request timeout for icmp_seq 681
Request timeout for icmp_seq 682
Request timeout for icmp_seq 683
Request timeout for icmp_seq 684
Request timeout for icmp_seq 685
Request timeout for icmp_seq 686
Request timeout for icmp_seq 687
Request timeout for icmp_seq 688
Request timeout for icmp_seq 689
Request timeout for icmp_seq 690
Request timeout for icmp_seq 691
Request timeout for icmp_seq 692
Request timeout for icmp_seq 693
Request timeout for icmp_seq 694
Request timeout for icmp_seq 695
Request timeout for icmp_seq 696
Request timeout for icmp_seq 697
Request timeout for icmp_seq 698
Request timeout for icmp_seq 699
64 bytes from 172.16.32.1: icmp_seq=700 ttl=253 time=26.339 ms
64 bytes from 172.16.32.1: icmp_seq=701 ttl=253 time=26.620 ms

This happens pretty frequently.  The IP above is the remote internal
carp address.  And this is the same address I hit when sasyncd is
disabled.  Its down for even longer periods of time on occasion, with
nothing in the logs.

I might need to start preparing some OpenVPN this weekend.

Thanks again for looking at this.  Any more insight would be much
appreciated.  I don't understand why my config seems to look like yours,
as well as other docs I've read, and yet I have these kinds of strange
issues, but only with sasyncd enabled.

-- 
Zach

Reply via email to