I've recently deployed a set of OpenBSD firewalls and nearing a time when they need to go production, but I've got an issue that I can't nail down.
I've got a pair of OpenBSD 5.4 systems running on Soekris 6501 at each location, for a total of four firewalls. Each pair is running the sasycnd, pfsync, carp combo, all of which seems to be working correctly. OpenBSD 5.4 GENERIC#37 amd64 * pfctl -ss shows the states are making it between peers * carp fails over nicely, with maybe 1 packet lost in my icmp testing * ipsecctl -sa shows associations sync to its peer * IPSec connections are established to the remote datacenter All of the above looks to be working. I initially had both sides of the IPSec configured with active mode, but I thought it was causing my packet loss perhaps to simultaneous initiation of the security association, so I set passive on one of the locations. The issue I am seeing is that traffic will periodically stop flowing between the sites. During this time, ipsecctl -sa shows that there are associations, netstat -rn says routes are up. My next thought was that perhaps phase1 key exchange was happening to frequently, so I increased it to 2 days, leaving phase2 default, but it still happens. Even still, I expect increasing the timeout is only masking the actual problem. I'm using nagios to monitor the connection, using the internal IPs of the firewalls as the address to ping, and from the history of nagios, it looks like I am having connection issues about every 15 minutes, and these are only the ones that are detected. Sometimes its just a couple packets. Sometimes its down for a good 90 seconds. Every five minutes or so I see this in daemon logs: Mar 5 19:32:32 opdx-fw1 isakmpd[28109]: isakmpd: quick mode done (as responder): src: 1.2.3.5 dst: 66.77.88.10 Which I expect due to the lifetime of the phase2 being set to default value. We deploy the configs using Puppet, so the consistency across machines should be solid, though that doesn't say much about errors in configs we deploy. I also wondered if the single threaded kernel might play a part here. 80% CPU when running Puppet. Would booting the MP kernel help? The CPUs do support it. In any case, I'm stuck. My coworkers are all looking at me wondering when they can purchase some shiny new commercial firewalls, and I'd really like to have a success story here. I can always switch do doing some SSH tunnel, or OpenVPN or some such, but since OpenBSD has IPSec built into core, a) I'd like to use it, and b) I expect it to work. I'm hoping someone on the list can point out something I am doing wrong. This is the fist time I've run OpenBSD in production, so my methods may not be conventional. IPSec is configured to talk to the remote site CARP address, so below thats 1.2.3.5 and 66.77.88.10. Here are (I think) the relevant configs. Please help. # SiteA ipsec.conf ike esp from { 66.77.88.10 10.224.0.0/12 } to { 1.2.3.5 10.240.0.0/12 } local 66.77.88.10 peer 1.2.3.5 main auth hmac-sha2-256 enc blowfish lifetime 172800 quick auth hmac-sha2-384 enc blowfish psk "secret" # SiteB ipsec.conf ike passive esp from { 1.2.3.5 10.240.0.0/12 } to { 66.77.88.10 10.224.0.0/12 } local 1.2.3.5 peer 66.77.88.10 main auth hmac-sha2-256 enc blowfish lifetime 172800 quick auth hmac-sha2-384 enc blowfish psk "secret" # rc.conf ntpd_flags="" isakmpd_flags="-K -S -v" sasyncd_flags="" ipsec=YES syslogd_flags="-h" snmpd_flags=YES #sasyncd on one of the system peer 1.2.3.7 interface carp1 sharedkey 0xsuperlonghex If PF information is needed, I can provide and obscure, but I didn't expect it to be the issue. What else can I use that would help me troubleshoot this? What more information can I provide that would help narrow this down? Regards, -- Zach