Hi, I've reviewed the keepalive draft. Generally, i do like the concept a lot because: a) there's certainly a use case that covers real operational issues b) it's a space-efficient and c) unobstrusive. However, i think the document could improve on clarity in certain aspects - see below for details. I've tagged my feedback with "NIT", "EDITORIAL", "PROTOCOL" to indicate "severity" levels of the comments.
* EDITORIAL: I think the Abstract could be cut down to the first sentence of the second paragraph. Everything else should go into (or is already a copy of) the Introduction section. Personal taste, i know, but i like short, to the point abstracts. * NIT: API is not expanded on first use * EDITORIAL: The second paragraph on page 4 lists DNSSEC and crypto-related RRTypes as the culprits for the prevalence of truncated responses. However, it misses the main point that increased response sizes are the primary problem. So changing the text into something "The increasing size of response packets, (for example due to deployment of DNSSEC and crypto-related RRTypes)... " would be better. * EDITORIAL: Mention somewhere that the re-use of TCP connections to nameservers would even benefit more in case DNS over TLS would be introduced. (Sidenote: From the architectural perspective, i think a TLS-DNS spec should actually REQUIRE a TLS enabled DNS client to support the Keepalive option) * PROTOCOL: I'm missing a normative and clear definition of how to interpret "TIMEOUT" values. The Option format says "a timeout value for the TCP connection" (which is way underspecified, given the various timeouts in TCP). 3.2.1 on the other hand says it's "representative of the minimum expected time an individual session should remain established for it to be used..." (Which i interpret as the absolute session duration from the time of establishment) - other sections of the document make me believe it's the maximum interval between two subsequent queries ... That needs to be fixed, because it will cause interop problems if not clearly defined. My proposal would be that the TIMEOUT is the maximum interval a client can use that TCP connection since it received the last DNS message from the other end (i don't consider 1/2 RTT here as relevant). The document should also clarify the relation between those application level keepalive mechanism and TCP level keepalives - i do understand there's no such relation - which should be mentioned. Also, please clarify that TIMEOUT is unsigned... * PROTOCOL: Are the TIMEOUT values (whatever time interval they define, see above) negotiated for a single session only, or do they affect *all* TCP DNS sessions to a specific IP address? Since there is no "session" for the UDP part of the negotiation, the client's first assumption would be it's per IP address. However, as soon as the TCP sessions are established, the TIMEOUT (re-)negotiation could differ for each individual session? A short clarification would be good - i think the TIMEOUT value should be independent for each individual TCP session. * EDITORIAL: 3.2.2, second paragraph: The main point is missing in the second sentence, i suggest adding " .... MAY keep the existing TCP session open, *up to the duration indicated in the TIMEOUT value of the response.*" * PROTOCOL: Is there any way to signal to a client that it should stop using the session as soon as possible, because the server wants to tear it down immediately? Since "0" is currently reserved for infinitely long sessions, that is not an option. A value of "1" would allow the client to continue using the session for another second, wich is suboptimal (and could impact several 10k's of packets on busy servers...) - So, i do suggest re-considering the semantics of the "0" (maybe use 65535 as "infinity" and add text that a value of "0" should indicate "teardown immediately" - that would be more logical to me?) * PROTOCOL: Is the expected behaviour (MUST) from both client and server that they should add the Option to every single request / response during a keepalive session? Please clarify the intended behaviour.. - If Yes: what is the expected behaviour in case a subsequent packet does not include that option? Keep going until TIMEOUT, or assume that the server suddenly doesn't want to do keepalive anymore, and revert to "dumb" behaviour? - If No: My assumption would be that after the TIMEOUT is inititally negotiated, client/server would keep counting, no matter whether messages continue the option. Only once the TIMEOUT approaches, a single packet would "refresh" the TIMEOUT? Personally, i tend towards "No", because sending the information in each and every message seems redundant to me (updating session timers on each single packet).. Feedback appreciated :) tia, Alex _______________________________________________ DNSOP mailing list DNSOP@ietf.org https://www.ietf.org/mailman/listinfo/dnsop