Hi, while performing stress tests, we ran into the problem, that is concerned with the way cookies are handled in RFC7296. The problem is that if network conditions are bad (high probability for packets to be delayed, reordered or lost), then some SAs are not established with AUTHENTICATION_FAILED error. Consider the following scenario:
Initiator Responder 1. HDR, SAi, KEi, Ni --> 2. --> HDR, SAi, KEi, Ni (server has a large number of half-open SAs, so it responds with cookie, that is generated using cookie secret K1) 2. (message is delayed in the network) ... <-- HDR, N(COOKIE1) 3. (client retransmits its original message) HDR, SAi, KEi, Ni --> 4. --> HDR, SAi, KEi, Ni (server has a large number half-open SAs, so it responds with cookie, but the cookie generation secret has been already changed, and becomes K2, that results in generating a new cookie for the same request, COOKIE2) 5. <-- HDR, N(COOKIE2) 6. HDR, N(COOKIE2) <-- 7. (client receives message and retransmits its request with COOKIE2) HDR, N(COOKIE2), SAi, KEi, Ni --> 8. --> HDR, N(COOKIE2), SAi, KEi, Ni (server verifies COOKIE2, it's OK, since K2 is still the current secret) 9. (message is delayed in the network) ... <-- HDR, SAr, KEr, Nr 10. HDR, N(COOKIE1) <-- 11. (eventually the delayed first message from the server with COOKIE1 reaches the client. Since the client doesn't know that COOKIE1 is stall, it decides that it's fresher than COOKIE2, because it receives this message later, so the client replaces cookie and resends its initial request with COOKIE1) HDR, N(COOKIE1), SAi, KEi, Ni --> (this message get lost) 12. HDR, SAr, KEr, Nr <-- (eventually delayed server's response reaches the client. At this point the client thinks that IKE_SA_INIT is completed and starts IKE_AUTH) What is interesting in the above diagram: both the client and the server have eventually completed IKE_SA_INIT, but they have different opinions on what IKE_SA_INIT message from initiator to responder contains. The client thinks that the server has responded to its most recently sent message HDR, N(COOKIE1), SAi, KEi, Ni, while the server has never received it and in fact has responded to HDR, N(COOKIE2), SAi, KEi, Ni. As a result - while calculating AUTH payload they will have different inputs to it and authentication will fail. Despite this diagram looking artificial, we did observe a noticeable number of these errors during real stress tests (up to 5% of SAs failed with this error in bad network conditions). What's particularly unfortunate with this: 1. The bad network conditions may happen as a result of DDoS attack, which also may cause cookie logic to be triggered on the server. So, the two pre-conditions - bad network and server under attack are coupled. 2. The most disappointing thing for me is that despite bad network conditions, peers did manage to complete initial IKE exchanges, only to get "authentication failed" result. 3. For customers this looks surprising - they have valid credentials, but from time to time they receive "authentication failed" diagnostics without any clue why this happens. The root of the problem is that IKE_SA_INIT request may be retransmitted with different content (different cookies) and the peers have no means to be sure that they have send/receive identical messages. And later these possibly different messages are used in AUTH payload calculation. I believe that the proper solution would be to exclude cookie from the AUTH payload calculation. It is verified by the responder using cookie generation secret and it is not concerned with a client (the client did not generate it, just echoes it back). However, this solution is obviously incompatible with RFC7296, so this is not an option. Any opinions? Should this problem be addressed by the WG or ignored? Regards, Valery. _______________________________________________ IPsec mailing list IPsec@ietf.org https://www.ietf.org/mailman/listinfo/ipsec