I did review the draft-ietf-ipsecme-failure-detection before the WG meeting and some of the comments I have here already have tickets so no need to add them second time: ----------------------------------------------------------------------
Comments to draft-ietf-ipsecme-failure-detection: Section 1: "However, in many cases the rebooted peer is a VPN gateway that protects only servers, " What is that supposed to mean? Section 2: "Those "at least several minutes" are a time during part of which both peers are active, but IPsec cannot be used." Not true! It is the time during one of the peer is active and another one is rebooting and the rebooting device might even get up before the time runs out as is described next few paragraphs. I suggest removing whole sentence. Section 2: "[RFC5996] does not mandate any time limits, but it is possible that the peer will start liveness checks even before the other end is sending INVALID_SPI notification, as it detected that the other end is not sending any packets anymore while it is still rebooting or recovering from the situation." I think "but it is possible that the peer will start ..." is wrong, more like "good implementation will start ...". If implementation supports black hole detection there is no point of doing that with long timeouts, as I said in our implementation that specific timeout is 10 seconds (i.e. around 20 times RTT which means with normal TCP etc traffic it never triggers, but will trigger very quickly after other end goes silent). Section 3: I still think the protocol would be much easier to implement if we limit the QCD Token Taker role for initiator and Token maker role for responder. There is no point of making the protocol very generic, as implementation are not going to implement features before there is real use scenario for it. This means even if document describes how it can be done it does not help as implementations do not support it. If someone finds real use scenario where it is needed for responder for being token taker then writing new specification for that is way faster than to get the implementations modified. I have not yet seen use scenario for that where QCD would help (meaning there are other already standardized ways in IKEv2 which are faster and more efficient implemented in implementations). Section 4.2: "The QCD_TOKEN notification is related to the IKE SA and MUST follow the AUTH payload and precede the Configuration payload and all payloads related to the child SA." RFC5996 removed payload ordering restrictions, so why are we adding them back here? I suggest removing the whole paragraph. Section 5.2: I would remove this whole section. Section 7: I would remove this whole section. It was good to be there, but I do not think we need it anymore. At least section 7.4 is still completely wrong and is already covered by the section 2. Section 8: "Before establishing a new IKE SA using Session Resumption, a client should ascertain that the gateway has indeed failed. This could be done using either a liveness check (as in RFC 5996) or using the QCD tokens described in this document." How do you use QCD tokens to ascertain that the gateways has indeed failed. If you receive QCD token then you know that other end is dead, but to receive QCD token the active operation you do is to send liveness check. I think this sentence requires some rewrite. Section 8: Example is wrong. The HDR, {} --> <-- HDR, N(QCD_TOKEN) should be HDR, SK{} --> <-- HDR, N(INVALID_IKE_SPI), N(QCD_TOKEN) Section 9.1: "Implementing the "token maker" side of QCD makes sense for IKE implementation where protected connections originate from the peer, such as inter-domain VPNs and remote access gateways. Implementing the "token taker" side of QCD makes sense for IKE implementations where protected connections originate, such as inter-domain VPNs and remote access clients." So token maker and toker are both used "where protected connections originate"? What is the difference? This text requires clarifications. Section 9.1: "To clarify the this discussion:" ^^^^^^^^ Section 9.1: "o For inter-domain VPN gateway it makes sense to implement both roles, because it can't be known in advance where the traffic originates." I do not really see that. For Inter-Domain VPN gateways there is two possibilities: symmetric or asymmetric initiation. I.e. in asymmetric situation only one end can initiate connections (for example because it is behind NAT or similar or because the HQ VPN server is always configured to be responder). In that case the Inter-Domain VPN case is similar to the remote-access client / gateway case, i.e. the "initiator end of Inter-Domain VPN gateway" is same as "remote-access client" and "Responder end of the Inter-Domain VPN Gateway" is same as "remote-access server". For symmetric situations where either end can initiate connections there are better and faster ways to handle things, as I have already described earlier. Section 10.1: "Specifically, if one taker does not properly secure the QCD tokens and an attacker gains access to them, this attacker MUST NOT be able to guess other tokens generated by the same maker." Is bit misleading, as for attacker it is trivial to get large amount of tokens. It just need to send one faked IKE SA packet to token maker with random IKE SPIs to get valid token for that IKE SPI pair. Section 10.3: "An attacker may try to attack QCD if the generation algorithm described in Section 5.1 is used." I do not think there is that big difference between 5.1 and 5.2 in here. The 5.2 will limit the dictionary for one IP address, but as it is already impossibly large it does not matter. I would suggest removing the reference to 5.1 in first sentece. Section 10.4: Needs also comment that the load balancer switch demuxing MUST stay stable. I.e. it can never change. Especially it cannot change even when one devices goes off-line. Also there MUST NOT be a way to bypass the load balancer using whether methods possible (including tunneling packets in some other tunneling protocolos, adding routing headers etc). I would add even more warning that this setup is extremly dangeours. Luckily section 10.2 already forbids this: "This document does not specify how a load sharing configuration of IPsec gateways would work, but in order to support this specification, all members MUST be able to tell whether a particular IKE SA is active anywhere in the cluster. One way to do it is to synchronize a list of active IKE SPIs among all the cluster members." -- kivi...@iki.fi _______________________________________________ IPsec mailing list IPsec@ietf.org https://www.ietf.org/mailman/listinfo/ipsec