> I can think of several possibilities as to why some negotiations are taking
> more than 60 seconds. For instance:
> 
> (1) The Cisco 7301 may be slow to respond. It does have a VAM2+ crypto
> accelerator installed, but I don't know if it's used for isakmp exchanges,
> or just for symmetric encryption/decryption. (However, 'show proc cpu
> history' suggests CPU load is no more than about 25%)
> 
> (2) There may be packet loss and retransmissions, maybe due to some network
> buffer overflowing, either on OpenBSD or Cisco.
> 
> The OpenBSD box is using a nasty rl0 card, because that's the only spare
> interface I had available to go into the test LAN. Having said that,
> watching with 'top' I don't see the interrupt load go above 10%.
> 
> I'm not sure how to probe deeper to get a handle on what's actually
> happening though. Perhaps isakmpd -L logging might shed some light, although
> I don't fancy decoding QM exchanges by hand :-(

As a starting point, I tried capturing packets with both isakmpd -L and
tcpdump at the same time. This is for a config which creates 1000 SAs. Then
I wrote a little Perl script to take the output from tcpdump -r and count
the number of outbound packets (10.1.1.6.500 > 10.1.1.1.500) and inbound
(reverse) in each second.

Here are the counts from tcpdump:

                (out)   (in)
09:47:37        3       3
09:47:38        1043    43
09:47:39        107     107
09:47:45        878     28
09:47:46        136     136
09:47:47        35      35
09:47:54        671     20
09:47:55        109     109
09:47:56        90      90
09:48:05        445     13
09:48:06        95      95
09:48:07        50      50
09:48:08        1       2
09:48:15        1       2
09:48:18        282     8
09:48:19        78      78
09:48:20        90      90
09:48:21        5       5
09:48:24        2       4
09:48:33        98      5
09:48:34        49      49
09:48:35        2       4
09:48:37        10      5
09:48:38        0       1
09:48:45        0       1
09:48:48        1       2
09:48:50        44      5
09:48:51        34      34
09:48:54        0       2
09:49:03        1       2
09:49:05        0       2
09:49:07        0       3
09:49:18        0       1
09:49:27        1004    0
09:49:33        0       1
TOTAL           5364    1035

And here are the counts from the pcap file generated by isakmpd -L:

09:47:37        192     3
09:47:38        853     43
09:47:39        107     107
09:47:45        28      28
09:47:46        136     136
09:47:47        35      35
09:47:54        20      20
09:47:55        109     109
09:47:56        90      90
09:48:05        13      13
09:48:06        95      95
09:48:07        50      50
09:48:08        1       2
09:48:15        1       2
09:48:18        8       8
09:48:19        78      78
09:48:20        90      90
09:48:21        5       5
09:48:24        2       4
09:48:33        5       5
09:48:34        49      49
09:48:35        2       4
09:48:37        10      5
09:48:38        0       1
09:48:45        0       1
09:48:48        1       2
09:48:50        5       5
09:48:51        34      34
09:48:54        0       2
09:49:03        1       2
09:49:05        0       2
09:49:07        0       3
09:49:18        0       1
09:49:27        1004    0
TOTAL           3024    1034

Now, whilst the number of input packets tally almost exactly, strangely the
number of outbound packets recorded by isakmpd -L is much lower than the
actual number of packets seen going out by tcpdump. I can only guess that
packet retransmits are not being recorded by isakmpd -L?

Anyway, looking at the packet capture on the wire, I interpret it as follows:

09:47:37        3       3
  >> main mode exchange (3 out, 3 in)
09:47:38        1043    43
  >> 1000 QM requests, plus 43 QM responses triggering 43 QM completions
09:47:39        107     107
  >> 107 more QM responses triggering 107 QM completions

So 150 QM exchanges have completed. Then nothing more happens until:

09:47:45        878     28

That's 850 QM retransmits, plus 28 QM responses, triggering 28 QM
completions. Looking at this, it seems like either 850 of the first burst of
1000 packets were not delivered, or arrived at the Cisco and could not be
handled; or they arrived successfully, but the responses didn't arrive.

So now I need to establish whether those original 1,000 sent packets were
actually arriving at the Cisco or not, which perhaps careful use of
interface counters might reveal, or else I need to dig out a switch with
port mirroring.

However, I see that isakmpd checks the result of the sendmsg() call, and it
wasn't failing.

Also netstat -ni on the OpenBSD side shows no IErrs or OErrs (although I
don't see counters specifically for buffer overflows). Ah, but show int g0/0
on the Cisco says:

  ...
  Input queue: 1/75/17872/12785 (size/max/drops/flushes); Total output drops: 0
  Queueing strategy: fifo
  Output queue: 0/40 (size/max)
  5 minute input rate 1000 bits/sec, 1 packets/sec
  5 minute output rate 1000 bits/sec, 1 packets/sec
     468241 packets input, 81098292 bytes, 383 no buffer
     Received 1897 broadcasts, 0 runts, 0 giants, 937 throttles
     0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
     0 watchdog, 40 multicast, 0 pause input
     0 input packets with dribble condition detected
     98197 packets output, 16557357 bytes, 0 underruns
     2 output errors, 0 collisions, 1 interface resets
     0 babbles, 0 late collision, 0 deferred
     2 lost carrier, 0 no carrier, 0 pause output
     0 output buffer failures, 0 output buffers swapped out

Hmm - queue depth of 75 - and we got 150 QM exchanges in the first run. So a
quick config change:

ciscolns(config-if)#hold-queue 4000 in
ciscolns(config-if)#hold-queue 4000 out

Unfortunately that made no difference - still 150 QM exchanges in the first
batch. But maybe there's some other queue in the Cisco that needs tweaking
(e.g. for outstanding IPSEC requests)

At least no packets are being dropped now:

  Last clearing of "show interface" counters 00:01:28
  Input queue: 1/4000/0/353 (size/max/drops/flushes); Total output drops: 0
  Queueing strategy: fifo
  Output queue: 0/4000 (size/max)
  5 minute input rate 14000 bits/sec, 12 packets/sec
  5 minute output rate 5000 bits/sec, 4 packets/sec
     3990 packets input, 637690 bytes, 0 no buffer
     Received 0 broadcasts, 0 runts, 0 giants, 0 throttles
     0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
     0 watchdog, 0 multicast, 0 pause input
     0 input packets with dribble condition detected
     1015 packets output, 221409 bytes, 0 underruns
     0 output errors, 0 collisions, 0 interface resets
     0 babbles, 0 late collision, 0 deferred
     0 lost carrier, 0 no carrier, 0 pause output
     0 output buffer failures, 0 output buffers swapped out

Regards,

Brian.

Reply via email to