HI Roman,
thank you for the review. Please see comments inline.
> Hi!
>
> I performed a AD review of draft-ietf-ipsecme-rfc8229bis-05. Thanks for
> revising RFC8229 with this new
> guidance. Comments are below:
>
> ** The abstract notes that many of the document updates came from deployment
> experience. I'm hoping to
> incorporate that feedback on a particular issue. There are a number places
> in this document where
> qualitative recommendations are made about various network stack timers. Can
> quantitative
> recommendations be made in any of the following:
Traditionally, IPsec specifications contain very few quantitatives concerning
various timings.
This is due to the belief that concrete timeouts don't affect interoperability.
Instead, some very generic recommendations are usually given.
See for example Section 2.4 of RFC 7296:
The number of retries and length of timeouts are not covered in this
specification because they do not affect interoperability. It is
suggested that messages be retransmitted at least a dozen times over
a period of at least several minutes before giving up on an SA, but
different environments may require different rules.
> -- Section 7.1 "If the TCP connection is no more associated with any active
> IKE SA, the TCP Responder MAY
> close the connection to clean up resources if TCP Originator didn't close it
> within some reasonable period of
> time."
I don't think we should prescribe concrete time to wait (since it is a
Responder's matter when to free up its resources),
but we can add a recommendation. How about:
If the TCP connection is no more
associated with any active IKE SA, the TCP Responder MAY close the
connection to clean up resources if TCP Originator didn't close it
within some reasonable period of time (e.g. few seconds).
The reason for keeping the orphan TCP connection for some short time is to allow
the Initiator to re-use it in case it is ever possible. For example, if the
responder returned
an error notify and deletes the IKE SA, but the initiator is able to recover
(e.g.
after COOKIE request or INVALID_KE_PAYLOAD) then if the Responder immediately
closes TCP connection, then the Initiator will have to re-establish it, thus
wasting 2 RTT.
So, this is just for optimization, nothing fatal happens if the responder closes
orphan TCP connection immediately.
> -- Section 7.4. "In particular, it is advised that the Initiator should not
> act immediately after receiving error
> notification and should instead wait some time for valid response, ..."
This text is just a repetition of what RFC 7296 contains (Section 2.21.1).
This specification recommends not to
follow RFC 7296 in this situation and act upon immediately if error
notification is received.
> -- Section 8.1. "If no response is received within a certain period of time
> after several retransmissions ..."
It's hard to give any concrete recommendations here. If the initiator
switches to TCP too quickly,
then it may end up with TCP transport while UDP is available on this path.
This is suboptimal.
On the other hand, if it waits too long before switching to TCP in situation
when UDP doesn't work,
then it makes the connection outage longer. How about adding the following
sentence:
The value of timeout and the number of retransmissions may vary
depending on the
initiator's configuration, but it is expected that the initiators
would try to
get response over UDP for at least half a minute sending at least dozen
retransmissions
before switching to TCP.
What WG members think about these values?
> -- Section 8.4. "For the client, the cluster failover event may remain
> undetected for long time if it has no IKE
> or ESP traffic to send. "
Hm, I'm a bit confused what quantitative do you want to see here. It is just
an ascertaining
that as long as no traffic originates from the client to the cluster then the
fact that
the failover takes place will not be known to the client (in case of TCP).
> -- Section 8.4. "if support for High Availability in IKEv2 is negotiated and
> TCP transport is used, a client that is
> a TCP Originator SHOULD periodically send IKEv2 messages (e.g. by
> initiating liveness check exchange)
> whenever thereis no IKEv2 or ESP traffic."
Again, it's hard to give concrete recommendations. All depends on the
client's policy.
If it wants to minimize the delay it detects the cluster failover, then it
would send
liveness check messages more frequently. On the other hand, if it wants to
save
resources, it would send them less frequently. I don't think any "one size
fits all"
recommendation can be given.
> The only place I found quantitative guidance was in Section 7.3.1.
>
> ** Section 6.1. Editorial. s/with new Initiators's SPI/with the Initiator's
> new SPI/
>
> ** Section 7.1 Editorial.
>
> OLD
> If the TCP connection