Hi,

while performing stress tests, we ran into the problem, that is concerned with 
the way cookies are handled in RFC7296.
The problem is that if network conditions are bad (high probability for packets 
to be delayed, reordered or lost),
then some SAs are not established with AUTHENTICATION_FAILED error. Consider 
the following scenario:

        Initiator                                                       
Responder
1. HDR, SAi, KEi, Ni  -->                                               
2.                                                              -->     HDR, 
SAi, KEi, Ni  
                                                                        (server 
has a large number of half-open SAs, so it responds
with cookie,
                                                                        that is 
generated using cookie secret K1)
2. (message is delayed in the network)  ...     <--      HDR, N(COOKIE1)
3. (client retransmits its original message)
HDR, SAi, KEi, Ni  -->                                          
4.                                                              -->     HDR, 
SAi, KEi, Ni  
                                                                        (server 
has a large number half-open SAs, so it responds
with cookie,
                                                                        but the 
cookie generation secret has been already changed,
and becomes K2,
                                                                        that 
results in generating a new cookie for the same
request, COOKIE2)
5.                                                              <--      HDR, 
N(COOKIE2)
6. HDR, N(COOKIE2)   <--
7. (client receives message and retransmits its request with COOKIE2)
HDR, N(COOKIE2), SAi, KEi, Ni    -->
8.                                                              -->     HDR, 
N(COOKIE2), SAi, KEi, Ni
                                                                        (server 
verifies COOKIE2, it's OK, since K2 is still the
current secret)
9. (message is delayed in the network)  ...     <--     HDR, SAr, KEr, Nr
10. HDR, N(COOKIE1)   <--
11. (eventually the delayed first message from the server with COOKIE1 reaches 
the client. Since the client doesn't know 
that COOKIE1 is stall, it decides that it's fresher than COOKIE2, because it 
receives this message later, so the client replaces
cookie 
and resends its initial request with COOKIE1)
HDR, N(COOKIE1), SAi, KEi, Ni    -->    (this message get lost)
12. HDR, SAr, KEr, Nr   <--
(eventually delayed server's response reaches the client. At this point the 
client thinks that IKE_SA_INIT is completed and starts
IKE_AUTH)

What is interesting in the above diagram: both the client and the server have 
eventually completed IKE_SA_INIT,
but they have different opinions on what IKE_SA_INIT message from initiator to 
responder contains.
The client thinks that the server has responded to its most recently sent 
message HDR, N(COOKIE1), SAi, KEi, Ni,
while the server has never received it and in fact has responded to HDR, 
N(COOKIE2), SAi, KEi, Ni. 
As a result - while calculating AUTH payload they will have different inputs to 
it and authentication will fail.

Despite this diagram looking artificial, we did observe a noticeable number of 
these errors
during real stress tests (up to 5% of SAs failed with this error in bad network 
conditions).

What's particularly unfortunate with this:
1. The bad network conditions may happen as a result of DDoS attack, which also 
may cause cookie logic to be triggered on the
server.
     So, the two pre-conditions - bad network and server under attack are 
coupled.
2. The most disappointing thing for me is that despite bad network conditions, 
peers did manage to complete initial IKE exchanges,
     only to get "authentication failed" result.
3. For customers this looks surprising - they have valid credentials, but from 
time to time they receive "authentication failed"
     diagnostics without any clue why this happens.

The root of the problem is that IKE_SA_INIT request may be retransmitted with 
different content (different cookies)
and the peers have no means to be sure that they have send/receive identical 
messages. And later these
possibly different messages are used in AUTH payload calculation. 

I believe that the proper solution would be to exclude cookie from the AUTH 
payload calculation.
It is verified by the responder using cookie generation secret and it is not 
concerned with a client 
(the client did not generate it, just echoes it back). However, this solution 
is obviously 
incompatible with RFC7296, so this is not an option. 

Any opinions? Should this problem be addressed by the WG or ignored?

Regards,
Valery.


_______________________________________________
IPsec mailing list
IPsec@ietf.org
https://www.ietf.org/mailman/listinfo/ipsec

Reply via email to