Spencer Dawkins has entered the following ballot position for draft-ietf-dnsop-session-signal-12: Discuss
When responding, please keep the subject line intact and reply to all email addresses included in the To and CC lines. (Feel free to cut this introductory paragraph, however.) Please refer to https://www.ietf.org/iesg/statement/discuss-criteria.html for more information about IESG DISCUSS and COMMENT positions. The document, along with other ballot positions, can be found here: https://datatracker.ietf.org/doc/draft-ietf-dnsop-session-signal/ ---------------------------------------------------------------------- DISCUSS: ---------------------------------------------------------------------- I really like this document, and think it's headed the right direction. Of course I have four pages of comments, because reasons, but the only part I'm really confused about is this one ... I would have thought that if you end up with a different endpoint because your anycast address now resolves differently, the new endpoint would have to have shared a lot of state with the previous endpoint, for this to work: When an anycast service is configured on a particular IP address and port, it must be the case that although there is more than one physical server responding on that IP address, each such server can be treated as equivalent. If a change in network topology causes packets in a particular TCP connection to be sent to an anycast server instance that does not know about the connection, the normal keepalive and TCP connection timeout process will allow for recovery. What I would have expected to happen, is that the new endpoint sees a packet arrive that's not on a synchronized TCP connection, and immediately responds with a RST (reset), rather than the normal keepalive and TCP connection timeout process happening. That's also the way I'm reading https://tools.ietf.org/html/rfc7828#section-3.6. Is that not the way it's working for anycast these days? ---------------------------------------------------------------------- COMMENT: ---------------------------------------------------------------------- Everything else is a comment, so non-blocking, and please do the right thing. This is a nit, and your answer could be "no", and that's fine, but in some places this document uses "DSO keepalive", and in other places, "keepalive" with no qualifier. It's likely that less confusion would result if you could consistently call this "DSO keepalive", so that it is clearly NOT a TCP keepalive. Do the right thing, of course. Is the expectation that DSO would also be used in DNS over HTTP? I'm reading At the time of publication, DSO is specified only for DNS over TCP [RFC1035] [RFC7766], and for DNS over TLS over TCP [RFC7858]. Any use of DSO over some other connection technology needs to be specified in an appropriate future document. and noticing that https://tools.ietf.org/html/draft-ietf-doh-dns-over-https-12 is currently in IETF Last Call. This next one is well within the "Spencer wouldn't have done it this way, but Spencer's not the working group, or the IETF" range, but However, in the typical case a server will not know in advance whether a client supports DSO, so in general, unless it is known in advance by other means that a client does support DSO, a server MUST NOT initiate DSO request messages or DSO unacknowledged messages until a DSO Session has been mutually established by at least one successful DSO request/response exchange initiated by the client, as described below. Similarly, unless it is known in advance by other means that a server does support DSO, a client MUST NOT initiate DSO unacknowledged messages until after a DSO Session has been mutually established. seems fragile, especially in environments where clients can come and go, and servers may be addressed using anycast (so I knew in advance that the four servers at that anycast address supported DSO, but somebody installed a fifth server that does not). Is that unlikely to be a problem? I'm sure A single server may support multiple services, including DNS Updates [RFC2136], DNS Push Notifications [I-D.ietf-dnssd-push], and other services, for one or more DNS zones. When a client discovers that the target server for several different operations is the same target hostname and port, the client SHOULD use a single shared DSO Session for all those operations. A client SHOULD NOT open multiple connections to the same target host and port just because the names being operated on are different or happen to fall within different zones. This requirement is to reduce unnecessary connection load on the DNS server. is correct from the server side, but perhaps it's also worth noting that using multiple TCP connections unnecessarily increases the chances that data transfers happen during TCP slow start. If only one or two packets are being exchanged, that doesn't matter, but as more packets are exchanged, the difference increases, because congestion windows will grow more rapidly if fewer connections are used. I appreciate the inclusion of 5.4. DSO Response Generation But I've gotta ask. In the last paragraph of that section, I see o Use a networking API that lets the receiver signal to the TCP implementation that the receiver has received and processed a client request for which it will not be generating any immediate response. This allows the TCP implementation to operate efficiently in both cases; for requests that generate a response, the TCP ACK, window update, and DSO response are transmitted together in a single TCP segment, and for requests that do not generate a response, the application-layer software informs the TCP implementation that it should go ahead and send the TCP ACK and window update immediately, without waiting for the Delayed ACK timer. Unfortunately it is not known at this time which (if any) of the widely-available networking APIs currently include this capability. I would love to know if there are any widely-available network APIs that include this capability, before including this text in a standards-track RFC. Do you need help chasing this down? The text in 6.1. DSO Session Initiation seems rough to me, for a couple of reasons. The client may perform as many DNS operations as it wishes using the newly created DSO Session. Operations SHOULD be pipelined (i.e., the I don't understand why this would be a SHOULD. At least from the client's perspective, it's not needed for interoperation. client doesn't need wait for a response before sending the next message). The server MUST act on messages in the order they are transmitted, but responses to those messages SHOULD be sent out of order when appropriate. Is it correct to say that "responses to those messages SHOULD be sent when they become available, even if the responses are sent out of order"? If not, I'm probably missing what "when appropriate" means. I'm a bit mystified by this text in 6.2. DSO Session Timeouts In the usual case where the inactivity timeout is shorter than the keepalive interval, it is only when a client has a very long-lived, low-traffic, operation that the keepalive interval comes into play, to ensure that a sufficient residual amount of traffic is generated to maintain NAT and firewall state and to assure client and server that they still have connectivity to each other. I think the basics are correct - the inactivity timer and (DSO) keepalive interval are independent - but I'm struggling to think of a reason to send (DSO) keepalives that's NOT tied to maintaining NAT/firewall state, and there's a lot of text before the paragraph that mentions NAT/firewall, that talks about why either interval might be longer or shorter than the other, without considering NAT/firewall. Am I missing something here? .... and, now that I keep reading, 6.5.2. Values for the Keepalive Interval does a much better job of explaining how a (DSO) keepalive interval should be selected - I think you could reasonably delete most of the text about (DSO) keepalive intervals in section 6.2, and at most provide a forward pointer to 6.5.2. (As an aside, I think you probably want to cite https://tools.ietf.org/html/bcp142 as the operative recommendation for NAT behaviour toward TCP, since https://tools.ietf.org/html/rfc5382 has been updated) I found this text For long-lived DNS Stateful operations (such as a Push Notification subscription [I-D.ietf-dnssd-push] or a Discovery Relay interface subscription [I-D.ietf-dnssd-mdns-relay]), an operation is considered in progress for as long as the operation is active, until it is cancelled. This means that a DSO Session can exist, with active operations, with no messages flowing in either direction, for far longer than the inactivity timeout, and this is not an error. This is why there are two separate timers: the inactivity timeout, and the keepalive interval. Just because a DSO Session has no traffic for an extended period of time does not automatically make that DSO Session "inactive", if it has an active operation that is awaiting events. to be extremely helpful, but it's 28 pages into the document. Is there a place earlier in the document that describes these timers, where you could place this text? Maybe section 3/Terminology isn't the right place, but maybe there is a right place toward the front of the document. I'm not understanding why the SHOULDs are not MUSTs in this text: If, at any time during the life of the DSO Session, twice the inactivity timeout value (i.e., 30 seconds by default), or five seconds, if twice the inactivity timeout value is less than five seconds, elapses without there being any operation active on the DSO Session, the server SHOULD consider the client delinquent, and SHOULD forcibly abort the DSO Session. Perhaps part of my confusion is that I'm not sure what it means to "consider the client delinquent", but NOT to "forcibly abort the DSO session". But there are several "will forcibly abort"s in section 6.4.2, that sound more like MUST than SHOULD. I don't think the MUST NOT in Normally a server MUST NOT close a DSO Session with a client. A server only causes a DSO Session to be ended in the exceptional circumstances outlined below. is quite right. Given that you have a bulleted list of reasons why a server would violate the MUST not immediately following this sentence, you might want to say "Normally a server does not close" here. _______________________________________________ DNSOP mailing list DNSOP@ietf.org https://www.ietf.org/mailman/listinfo/dnsop