> I can think of several possibilities as to why some negotiations are taking > more than 60 seconds. For instance: > > (1) The Cisco 7301 may be slow to respond. It does have a VAM2+ crypto > accelerator installed, but I don't know if it's used for isakmp exchanges, > or just for symmetric encryption/decryption. (However, 'show proc cpu > history' suggests CPU load is no more than about 25%) > > (2) There may be packet loss and retransmissions, maybe due to some network > buffer overflowing, either on OpenBSD or Cisco. > > The OpenBSD box is using a nasty rl0 card, because that's the only spare > interface I had available to go into the test LAN. Having said that, > watching with 'top' I don't see the interrupt load go above 10%. > > I'm not sure how to probe deeper to get a handle on what's actually > happening though. Perhaps isakmpd -L logging might shed some light, although > I don't fancy decoding QM exchanges by hand :-(
As a starting point, I tried capturing packets with both isakmpd -L and tcpdump at the same time. This is for a config which creates 1000 SAs. Then I wrote a little Perl script to take the output from tcpdump -r and count the number of outbound packets (10.1.1.6.500 > 10.1.1.1.500) and inbound (reverse) in each second. Here are the counts from tcpdump: (out) (in) 09:47:37 3 3 09:47:38 1043 43 09:47:39 107 107 09:47:45 878 28 09:47:46 136 136 09:47:47 35 35 09:47:54 671 20 09:47:55 109 109 09:47:56 90 90 09:48:05 445 13 09:48:06 95 95 09:48:07 50 50 09:48:08 1 2 09:48:15 1 2 09:48:18 282 8 09:48:19 78 78 09:48:20 90 90 09:48:21 5 5 09:48:24 2 4 09:48:33 98 5 09:48:34 49 49 09:48:35 2 4 09:48:37 10 5 09:48:38 0 1 09:48:45 0 1 09:48:48 1 2 09:48:50 44 5 09:48:51 34 34 09:48:54 0 2 09:49:03 1 2 09:49:05 0 2 09:49:07 0 3 09:49:18 0 1 09:49:27 1004 0 09:49:33 0 1 TOTAL 5364 1035 And here are the counts from the pcap file generated by isakmpd -L: 09:47:37 192 3 09:47:38 853 43 09:47:39 107 107 09:47:45 28 28 09:47:46 136 136 09:47:47 35 35 09:47:54 20 20 09:47:55 109 109 09:47:56 90 90 09:48:05 13 13 09:48:06 95 95 09:48:07 50 50 09:48:08 1 2 09:48:15 1 2 09:48:18 8 8 09:48:19 78 78 09:48:20 90 90 09:48:21 5 5 09:48:24 2 4 09:48:33 5 5 09:48:34 49 49 09:48:35 2 4 09:48:37 10 5 09:48:38 0 1 09:48:45 0 1 09:48:48 1 2 09:48:50 5 5 09:48:51 34 34 09:48:54 0 2 09:49:03 1 2 09:49:05 0 2 09:49:07 0 3 09:49:18 0 1 09:49:27 1004 0 TOTAL 3024 1034 Now, whilst the number of input packets tally almost exactly, strangely the number of outbound packets recorded by isakmpd -L is much lower than the actual number of packets seen going out by tcpdump. I can only guess that packet retransmits are not being recorded by isakmpd -L? Anyway, looking at the packet capture on the wire, I interpret it as follows: 09:47:37 3 3 >> main mode exchange (3 out, 3 in) 09:47:38 1043 43 >> 1000 QM requests, plus 43 QM responses triggering 43 QM completions 09:47:39 107 107 >> 107 more QM responses triggering 107 QM completions So 150 QM exchanges have completed. Then nothing more happens until: 09:47:45 878 28 That's 850 QM retransmits, plus 28 QM responses, triggering 28 QM completions. Looking at this, it seems like either 850 of the first burst of 1000 packets were not delivered, or arrived at the Cisco and could not be handled; or they arrived successfully, but the responses didn't arrive. So now I need to establish whether those original 1,000 sent packets were actually arriving at the Cisco or not, which perhaps careful use of interface counters might reveal, or else I need to dig out a switch with port mirroring. However, I see that isakmpd checks the result of the sendmsg() call, and it wasn't failing. Also netstat -ni on the OpenBSD side shows no IErrs or OErrs (although I don't see counters specifically for buffer overflows). Ah, but show int g0/0 on the Cisco says: ... Input queue: 1/75/17872/12785 (size/max/drops/flushes); Total output drops: 0 Queueing strategy: fifo Output queue: 0/40 (size/max) 5 minute input rate 1000 bits/sec, 1 packets/sec 5 minute output rate 1000 bits/sec, 1 packets/sec 468241 packets input, 81098292 bytes, 383 no buffer Received 1897 broadcasts, 0 runts, 0 giants, 937 throttles 0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored 0 watchdog, 40 multicast, 0 pause input 0 input packets with dribble condition detected 98197 packets output, 16557357 bytes, 0 underruns 2 output errors, 0 collisions, 1 interface resets 0 babbles, 0 late collision, 0 deferred 2 lost carrier, 0 no carrier, 0 pause output 0 output buffer failures, 0 output buffers swapped out Hmm - queue depth of 75 - and we got 150 QM exchanges in the first run. So a quick config change: ciscolns(config-if)#hold-queue 4000 in ciscolns(config-if)#hold-queue 4000 out Unfortunately that made no difference - still 150 QM exchanges in the first batch. But maybe there's some other queue in the Cisco that needs tweaking (e.g. for outstanding IPSEC requests) At least no packets are being dropped now: Last clearing of "show interface" counters 00:01:28 Input queue: 1/4000/0/353 (size/max/drops/flushes); Total output drops: 0 Queueing strategy: fifo Output queue: 0/4000 (size/max) 5 minute input rate 14000 bits/sec, 12 packets/sec 5 minute output rate 5000 bits/sec, 4 packets/sec 3990 packets input, 637690 bytes, 0 no buffer Received 0 broadcasts, 0 runts, 0 giants, 0 throttles 0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored 0 watchdog, 0 multicast, 0 pause input 0 input packets with dribble condition detected 1015 packets output, 221409 bytes, 0 underruns 0 output errors, 0 collisions, 0 interface resets 0 babbles, 0 late collision, 0 deferred 0 lost carrier, 0 no carrier, 0 pause output 0 output buffer failures, 0 output buffers swapped out Regards, Brian.