I would like to re-title this as something like "pf and iked instability on recent snapshots," but don’t know if doing so would break the mailing list thread, exiso, I left the subject unchanged...
> -----Original Message----- > From: Theodore Wynnychenko [mailto:t...@uchicago.edu] > Sent: Saturday, December 08, 2018 4:03 PM > To: misc@openbsd.org > Cc: 'Rachel Roch' > Subject: RE: TLS suddenly not working over IKED site-to-site > > > . . . > I now find I can no longer connect to with TLS/SSL over the iked tunnel > (the original behavior that seemed to have corrected itself). Also, > icinga continues to be unable to verify the status of the remote hosts > over port 5665. > > I don't have time right now to try using s_client and s_server and > watching enc0 to see what is happening, but I will when I can. > > If anyone has an ideas on what may be happening, please let me know. > > Thanks > Ted Hello again; So, I am at a complete loss to understand what is going on. Today, I tried using openssl s_client and s_server to make a connection through the iked vpn (as I described in my last post). However, with NO changes to iked.conf or pf.conf, today I had several connection attempts that completed correctly. I have not included any output from those sporadic, completely functional connections. Rather, today, most of the connections by s_client are not even acknowledged by the s_server on the other side of the iked vpn. For example: On the s_client machine: # openssl s_client -state -connect "remote.host":https SSL_connect:before/connect initialization SSL_connect:SSLv3 write client hello A ... and nothing more ... But on the s_server machine today all I see is: # openssl s_sever -state -accept https ...certificate options... Using auto DH parameters Using default temp ECDH parameters ACCEPT ... and no connection attempt is ever acknowledged ... (Yesterday, at least this first part of the connection was received by the s_server: Using auto DH parameters Using default temp ECDH parameters ACCEPT SSL_accept:before/accept initialization ... and nothing more yesterday ...) So, today using tcpdump on the outgoing interface of the s_client machine and the incoming interface of the "local" iked vpn endpoint shows: 16:43:05.107524 172.30.1.254.7305 > 172.30.7.205.443: S 1751796302:1751796302(0) win 16384 <mss 1460,nop,nop,sackOK,nop,wscale 6,nop,nop,timestamp 2698316052 0> 16:43:05.149146 172.30.1.254.7305 > 172.30.7.205.443: . ack 2119500805 win 256 <nop,nop,timestamp 2698316052 3536824996> 16:43:05.149895 172.30.1.254.7305 > 172.30.7.205.443: P 0:196(196) ack 1 win 256 <nop,nop,timestamp 2698316052 3536824996> 16:43:06.648487 172.30.1.254.7305 > 172.30.7.205.443: P 0:196(196) ack 1 win 256 <nop,nop,timestamp 2698316055 3536824996> 16:43:09.648557 172.30.1.254.7305 > 172.30.7.205.443: P 0:196(196) ack 1 win 256 <nop,nop,timestamp 2698316061 3536824996> 16:43:09.948433 172.30.1.254.7305 > 172.30.7.205.443: F 196:196(0) ack 1 win 256 <nop,nop,timestamp 2698316061 3536824996> 16:43:15.648712 172.30.1.254.7305 > 172.30.7.205.443: FP 0:196(196) ack 1 win 256 <nop,nop,timestamp 2698316073 3536825005> And this traffic (incomplete thought it may be for an ssl handshake) appears to be passed to enc0 intact: 16:43:05.105044 (authentic,confidential): SPI 0x151333df: 172.30.1.254.7305 > 172.30.7.205.443: S 3570513915:3570513915(0) win 16384 <mss 1300,nop,nop,sackOK,nop,wscale 6,nop,nop,timestamp 2698316052 0> (encap) 16:43:05.146122 (authentic,confidential): SPI 0xe1c30e4a: 172.30.7.205.443 > 172.30.1.254.7305: S 1312941075:1312941075(0) ack 3570513916 win 16384 <mss 1300,nop,nop,sackOK,nop,wscale 6,nop,nop,timestamp 3536824996 2698316052> (encap) 16:43:05.146654 (authentic,confidential): SPI 0x151333df: 172.30.1.254.7305 > 172.30.7.205.443: . ack 1 win 256 <nop,nop,timestamp 2698316052 3536824996> (encap) 16:43:05.147365 (unprotected): SPI 0x0000ef27: 172.30.1.254.7305 > 172.30.7.205.443: P 1:197(196) ack 1 win 256 <nop,nop,timestamp 2698316052 3536824996> (encap) 16:43:06.645932 (unprotected): SPI 0x0000ef27: 172.30.1.254.7305 > 172.30.7.205.443: P 1:197(196) ack 1 win 256 <nop,nop,timestamp 2698316055 3536824996> (encap) 16:43:09.646049 (unprotected): SPI 0x0000ef27: 172.30.1.254.7305 > 172.30.7.205.443: P 1:197(196) ack 1 win 256 <nop,nop,timestamp 2698316061 3536824996> (encap) 16:43:09.945908 (authentic,confidential): SPI 0x151333df: 172.30.1.254.7305 > 172.30.7.205.443: F 197:197(0) ack 1 win 256 <nop,nop,timestamp 2698316061 3536824996> (encap) 16:43:09.981966 (authentic,confidential): SPI 0xe1c30e4a: 172.30.7.205.443 > 172.30.1.254.7305: . ack 1 win 261 <nop,nop,timestamp 3536825005 2698316052,nop,nop,sack 1 {197:197} > (encap) 16:43:15.646158 (unprotected): SPI 0x0000ef27: 172.30.1.254.7305 > 172.30.7.205.443: FP 1:197(196) ack 1 win 256 <nop,nop,timestamp 2698316073 3536825005> (encap) BUT, at the other end of the VPN, on enc0, all that is seen leaving the iked VPN tunnel is: 16:43:05.130558 (authentic,confidential): SPI 0x151333df: 172.30.1.254.7305 > 172.30.7.205.443: S 3570513915:3570513915(0) win 16384 <mss 1300,nop,nop,sackOK,nop,wscale 6,nop,nop,timestamp 2698316052 0> (encap) 16:43:05.131049 (authentic,confidential): SPI 0xe1c30e4a: 172.30.7.205.443 > 172.30.1.254.7305: S 1312941075:1312941075(0) ack 3570513916 win 16384 <mss 1300,nop,nop,sackOK,nop,wscale 6,nop,nop,timestamp 3536824996 2698316052> (encap) 16:43:05.174802 (authentic,confidential): SPI 0x151333df: 172.30.1.254.7305 > 172.30.7.205.443: . ack 1 win 256 <nop,nop,timestamp 2698316052 3536824996> (encap) 16:43:09.966420 (authentic,confidential): SPI 0x151333df: 172.30.1.254.7305 > 172.30.7.205.443: F 197:197(0) ack 1 win 256 <nop,nop,timestamp 2698316061 3536824996> (encap) 16:43:09.966853 (authentic,confidential): SPI 0xe1c30e4a: 172.30.7.205.443 > 172.30.1.254.7305: . ack 1 win 261 <nop,nop,timestamp 3536825005 2698316052,nop,nop,sack 1 {197:197} > (encap) I have no idea what this all means, or what to do with it. But, I am following up in case anybody has any idea of what may be happening. Also, yesterday I described how the local iked machine appeared to be blocking packets that were explicitly allowed by pf.conf. From my post yesterday: ( For example, in the log I see: Dec 8 15:50:01 ... pf: Dec 08 15:48:49.346816 rule 4/(match) block out on em0: 172.30.7.205.22112 > 172.30.2.99.5665: R 3963276584:3963276584(0) ack 252894831 win 0 But, pfctl is running with following: # pfctl -s rules match in all scrub (no-df random-id max-mss 1300) pass in quick on em1 all flags S/SA pass out quick on em1 all flags S/SA block drop in log on em0 all block drop out log on em0 all ... pass quick inet proto tcp from 172.30.7.205 to 172.30.2.99 port = 5665 flags S/SA ... and on. ) Well, whatever was happening appears to have been resolved, because at about midnight local time on Sunday morning, icinga2 declared that the host was back up. To be clear, I have made no changes to either pf.conf or iked.conf on any of the machines involved in this testing from Saturday. Also, this had all been stable for the last (about) 2 years, until about two-three weeks ago. I did have another post, where I discussed the fact the iked VPN had failed to be reestablished after an update about 3-4 snapshots back. I got it working again by changing the local endpoint on the "remote" iked machine from the internal ip associated with the internal interface to an internal "alias" ip address associated with the outgoing/external interface of that machine. But, again, it had been working for 2 years until the recent update. I don't have any idea of what may be helpful in figuring out what I am doing wrong, or what has changed, but I am happy to provide any information that may be of help. I don't believe I have the knowledge to do more on my own at this point. Thanks for any advice. Ted