I've recently deployed a set of OpenBSD firewalls and nearing a time
when they need to go production, but I've got an issue that I can't nail
down.

I've got a pair of OpenBSD 5.4 systems running on Soekris 6501 at each
location, for a total of four firewalls.  Each pair is running the
sasycnd, pfsync, carp combo, all of which seems to be working correctly.

OpenBSD 5.4 GENERIC#37 amd64

* pfctl -ss shows the states are making it between peers
* carp fails over nicely, with maybe 1 packet lost in my icmp testing
* ipsecctl -sa shows associations sync to its peer
* IPSec connections are established to the remote datacenter

All of the above looks to be working.  I initially had both sides of the
IPSec configured with active mode, but I thought it was causing my
packet loss perhaps to simultaneous initiation of the security
association, so I set passive on one of the locations.

The issue I am seeing is that traffic will periodically stop flowing
between the sites.  During this time, ipsecctl -sa shows that there are
associations, netstat -rn says routes are up.

My next thought was that perhaps phase1 key exchange was happening to
frequently, so I increased it to 2 days, leaving phase2 default, but it
still happens.  Even still, I expect increasing the timeout is only
masking the actual problem.

I'm using nagios to monitor the connection, using the internal IPs of
the firewalls as the address to ping, and from the history of nagios, it
looks like I am having connection issues about every 15 minutes, and
these are only the ones that are detected.  Sometimes its just a couple
packets.  Sometimes its down for a good 90 seconds.

Every five minutes or so I see this in daemon logs:
Mar  5 19:32:32 opdx-fw1 isakmpd[28109]: isakmpd: quick mode done (as
responder): src: 1.2.3.5 dst: 66.77.88.10

Which I expect due to the lifetime of the phase2 being set to default
value.

We deploy the configs using Puppet, so the consistency across machines
should be solid, though that doesn't say much about errors in configs we
deploy.

I also wondered if the single threaded kernel might play a part here.
80% CPU when running Puppet.  Would booting the MP kernel help?  The
CPUs do support it.

In any case, I'm stuck.  My coworkers are all looking at me wondering
when they can purchase some shiny new commercial firewalls, and I'd
really like to have a success story here.  I can always switch do doing
some SSH tunnel, or OpenVPN or some such, but since OpenBSD has IPSec
built into core, a) I'd like to use it, and b) I expect it to work.  I'm
hoping someone on the list can point out something I am doing wrong.

This is the fist time I've run OpenBSD in production, so my methods may
not be conventional.

IPSec is configured to talk to the remote site CARP address, so below
thats 1.2.3.5 and 66.77.88.10.

Here are (I think) the relevant configs.  Please help.

# SiteA ipsec.conf
ike  esp from { 66.77.88.10 10.224.0.0/12 } to { 1.2.3.5
10.240.0.0/12 } local 66.77.88.10 peer 1.2.3.5 main auth
hmac-sha2-256 enc blowfish lifetime 172800 quick auth hmac-sha2-384 enc
blowfish psk "secret"

# SiteB ipsec.conf
ike passive esp from { 1.2.3.5 10.240.0.0/12 } to { 66.77.88.10
10.224.0.0/12 } local 1.2.3.5 peer 66.77.88.10 main auth
hmac-sha2-256 enc blowfish lifetime 172800 quick auth hmac-sha2-384 enc
blowfish psk "secret"

# rc.conf
ntpd_flags=""
isakmpd_flags="-K -S -v"
sasyncd_flags=""
ipsec=YES
syslogd_flags="-h"
snmpd_flags=YES

#sasyncd on one of the system
peer 1.2.3.7
interface carp1
sharedkey 0xsuperlonghex

If PF information is needed, I can provide and obscure, but I didn't expect it 
to be
the issue.

What else can I use that would help me troubleshoot this?  What more
information can I provide that would help narrow this down?

Regards,

-- 
Zach

Reply via email to