Hi all. Tero Kivinen sent the message included below to the mailing list on September 8th.
I am fine with this text. Please read it thoroughly, and if there are no objections, I will incorporate it into the next version of the draft (which I intend to publish at the last possible moment on Monday) Yoav Begin forwarded message: > From: Tero Kivinen <kivi...@iki.fi> > Date: September 8, 2010 1:07:03 PM GMT+03:00 > To: "ipsec@ietf.org" <ipsec@ietf.org> > Subject: [IPsec] Comments to draft-ietf-ipsecme-failure-detection-00 > > The section 2 describing RFC4306 crash recovery is not complete. It > does not include the normal processing happining on the peer that is > not rebooting. > > I suggest adding following text: > ---------------------------------------------------------------------- > When the one peer loses state or reboots it might not be able to > recover immediately (especially in case of reboot). This means that at > first the peer just goes silent, i.e. does not send or respond to any > messages. Conforming IKEv2 implementation will detect this situation > and follow the rules given in the section 2.4: > > "If there has only been outgoing traffic on all of > the SAs associated with an IKE SA, it is essential to confirm > liveness of the other endpoint to avoid black holes. If no > cryptographically protected messages have been received on an IKE SA > or any of its Child SAs recently, the system needs to perform a > liveness check in order to prevent sending messages to a dead peer." > > I.e. the peer usually will start liveness checks even before the other > end is sending INVALID_SPI notification, as it detected that the other > end is not sending any packets anymore while it is still rebooting or > recovering from the situation. > > This means that the several minutes recovery period is overlaping the > actual recover time of the other peer, i.e. if the security gateway > requires several minutes to boot up from the crash then the other > peers have already finished their liveness checks before the crashing > peer even has change to send INVALID_SPI notifications. > > There are cases where the peer looses state and is able to recover > immediately, in those cases it might take several minutes to recover. > > Note, that IKEv2 specification specifically leaves number of > retries and lengths of timeouts out from the specification, as they do > not affect interoperability. This means that implementations are > allowed to use the hints provided by the INVALID_SPI messages as hints > that will shorten those timeouts (i.e. different environment and > situation requiring different rules). > > Good existing IKEv2 implementations already do that (i.e. both shorten > timeouts or limit number of retries) based on that kind of hints and > also start liveness checks quickly after the other end goes silent. > ---------------------------------------------------------------------- > > The final paragraph saying: > > Those "at least several minutes" are a time during which both peers > are active, but IPsec cannot be used. > > is incorrect, as it is only true when the crashed peer recoverd > instantenously. In normal case most of that time is actually > overlaping the recovery time of the peer. > > -- > > The protocol currently says that: > > Supporting implementations will send a notification, called a "QCD > token", as described in Section 4.1 in the last IKE_AUTH exchange > messages. These are the final IKE_AUTH request and final IKE_AUTH > response that contain the AUTH payloads. > > This is very differnet compared to all other processing, usually this > kind of payloads are put to the same packet that contains traffic > selectors etc. Is there some reason why this is done this way? > > -- > > Also do we really need the QCD token for the initiator too? The > initiator has already proven to be able to create the IKE SA on its > own, and it will have enough information to recreate the IKE SA after > the boot. Responder usually does not have enough information to be > able to recrete the IKE SA on its own after reboot, as it might not > for example know anymore what was the peer address where the IKE SA > was connected to when it just has IP packet it needs to forward to > that peer. The initiator must already have that information as he was > able to trigger IPsec SA creation in the first place based on the ip > packet. > > I think it would simplify the implementations and the protocol by just > limiting that only responders can be token makers without loosing any > of the functionality. > > -- > > Section 7.4 is mostly wrong. > > The default retransmission policy needed for mobike cases is much, > much longer than what is needed in normal case. When mobike switches > from one interface to the another there might be very long delays > because of this (for example the device first needs to notice that > old interface does not work anymore, and then perhaps it needs to run > dhcp and other link related protocols on the new interface before it > can even try it and all those take a long time). > > For example in our implementation the mobike uses MUCH longer timeouts > just to make sure we do not time out the IKE exchanges while we are > trying to go through all possible interfaces etc. Because of those > even longer timeouts there is very good reason to shorten those > timeouts in case we get any feedback back from the other end (i.e. > INVALID_SPI notifications). > > The timeouts used in different situations even in the same > implementation needs to be different. In our case when you enable > mobike the number of retries used is more than 2 times what it is if > you do not turn mobike on. > > Also the last paragraph again assumes that the peer staying up didn't > start liveness check almost immediately when the crashing peer > crashed. This is something that is already part of the standard IKEv2 > specification, so implementions need to do that. This means the > timeout starts from the time of the crash, not from the time when the > gateway is up again. > > Anyways as all this is standard IKEv2 already it does not belong here > in the alternative solutions section, but belongs as part of the > section 2. > > -- > > Section 8 again ignores the IKEv2 text saying: > > "If there has only been outgoing traffic on all of > the SAs associated with an IKE SA, it is essential to confirm > liveness of the other endpoint to avoid black holes. If no > cryptographically protected messages have been received on an IKE SA > or any of its Child SAs recently, the system needs to perform a > liveness check in order to prevent sending messages to a dead peer." > > Especially the text: > > A failed gateway may go undetected > for as long as the lifetime of a child SA, because IPsec does not > have packet acknowledgement, and applications cannot signal the IPsec > layer that the tunnel "does not work". > > If the gateway has failed then if there is ANY traffic on any of the > IPsec SAs then that means that from the other peers point of view > there is only outgoing traffic, thus it needs to do liveness check to > verify that the other end is alive. Thus the failed gateway cannot > really go undetected for as long as the lifetime of child SA, unless > the lifetimes is in order of few minutes :-) > > I know there are implementations who do not implement that part of the > IKEv2 specification, but that does not mean that the part is not > there. We should not write or specifications to cover broken > implementations, but try to assume that implementations are following > the IKEv2 specification. > > Note that the IKEv2 text does not have any conditionals there, it says > that "...the system needs to perform a liveness check...". It does not > say it may, or even should do it, it says it needs to be done. > > Also I think the picture itself is bit incorrect, the exchange after > the reboot should probably be: > > ---- Reboot ----- > > HDR, SK {} --> > > <-- HDR, N(INVALID_IKE_SPI), N(QCD_TOKEN) > > > I.e I assume the first packet is normal liveness check, and the reply > that is normal INVALID_IKE_SPI with QCD_TOKEN. > > -- > > In section 9.1. it says that inter-domain VPN gateways should do both, > but I think that inter-domain VPN gateways does not really need this > specification as all, as they by configuration do know the other ends > IP-addresses etc, thus when the inter-domain VPN gateway gets up, it > can immediately create the IKE SAs needed based on the configuration. > This is in the case where either end of the inter-domain VPN gateway > can act as a initiator, i.e. no EAP is used, and neither is behind the > NAT. > > If one of the inter-domain VPN gateway is behind restricted NAT, then > it is more or less similar to the remote-access client case (i.e. only > that end can initiate connections), and as the other peer cannot > initiate connections to the gw behind NAT, there is no point of > supporting token taker on that end. > -- > kivi...@iki.fi > _______________________________________________ > IPsec mailing list > IPsec@ietf.org > https://www.ietf.org/mailman/listinfo/ipsec > > Scanned by Check Point Total Security Gateway. _______________________________________________ IPsec mailing list IPsec@ietf.org https://www.ietf.org/mailman/listinfo/ipsec