the idea is in my opinion relevant, and worthy of pursuit. i have
questions of clarification.

> Mukund Sivaraman <mailto:m...@isc.org>
> Monday, December 08, 2014 12:32 AM
> ...
>
> When a server determines that the response doesn't fit into a single
> datagram (512 or the client's message size), the server splits the reply
> into multiple fragment datagrams (512 or some discovered PMTU that
> works) such that:

there is no reason to support this in non-EDNS. if someone won't upgrade
to EDNS, then (1) we have no responsibility toward improving their DNS
experience, and (2) they probably will not upgrade to this multi-message
proposal either. (arguments of the form, "we want this to work when both
endpoints can do EDNS but the middlebox forbids EDNS", are answered by
noting that middleboxes probably would not permit multi-message DNS,
either.)

i think there is no PMTU that works. marka has fought this battle for a
long time, and he's currently suggesting 1280-(headersize) for IPv6 and
1500-(headersize) for IPv4, period. my hope is that any recommendation
for application-level fragmentation for DNS on UDP/53 would say
"MAX(reliably determined PMTU, MIN(client's offered buffer size, 1500 or
1280 depending on transport protocol))."

you might also allow the initiator to offer an ideal local-interface
maximum fragment size and MIN() against that also..
>
> 1. Each datagram is a DNS reply message with identical header field
> values (except for section counts) and TC=1 in each of them. The ID
> field has the same value among all reply fragments.
>
> 2. Each datagram contains part of the RRs that form the complete reply,
> split on RR boundaries. The DNS header contains the appropriate section
> counts for that datagram. The datagrams need not be equal in size.

splitting an RR-set across messages makes my skin itch. i know it's the
right thing to do and i'm not objecting. just letting you know, somebody
will some day not recognize the OPT code that describes this as a
multi-message transaction, and cache a partial RR-set, and we'll google
the message i am now typing to show them the error of their ways.
>
> 3. An additional RR (plain DNS) or pseudo RR (inside OPT) called
> FRAGMENT is present in every datagram with 2 16-bit fields containing
> the count of fragments, and current fragment. (Though a DNS message is
> limited to 1<<16 octets and a DNS datagram can be at least 512 octets
> long, 16-bit fields are better for fragment count as the datagrams can
> be of different sizes.)

i think the absence of ACK-based timing means that packet trains longer
than 256 packets are too dangerous to contemplate. even with some kind
of application-layer inter-record-gap that's a lot of packets to inject
without needing to hear an OK signal from the remote end. therefore i
suggest two 8-bit fields.
>
> 4. A client that doesn't know about this scheme notices TC=1 and retries
> with TCP. Datagrams other than the first one should be ignored as they
> are duplicate replies with the same message ID.

i think that wastes end-to-end bandwidth, and should be avoided, by
having the initiator solicit (for QUERY) or probe (for UPDATE) using an
EDNS OPT, rather than letting the responder just spew.
>
> 5. A client that is aware of this scheme finds TC=1 and the FRAGMENT RR
> and does reassembly (similar to IP fragment reassembly such as RFC 815),
> DNS messages being limited to 1<<16 octets too.

referencing your later message on this thread, i don't think compression
pointers can be allowed to point out-of-message. so, each message will
form its own string dictionary. if that's what you meant to say then i'm
sorry for misunderstanding you.
>
> This scheme still restricts the size of a single RR to the datagram
> size. Reassembly (unlike IP fragments) doesn't require offsets such as
> used in RFC 815 as RRs are wholly contained inside one datagram.
>
> TSIG can also be made to work with such a scheme on fragment by fragment
> basis.
>
> ----
>
> This scheme is not for replacing TCP. As mentioned above, if a TXT RR
> containing multiple character-strings doesn't fit in a single datagram
> for example, and truncation happens, it'll require TCP. It's not for
> replacing EDNS's large datagram sizes too. But it is possible for EDNS
> replies to overflow path MTU causing loss of replies, and when loss is
> noted, on second attempt, truncation could occur as the message no
> longer fits in reduced datagram size.
>
> Some things can still be served by UDP where possible (without involving
> all the baggage of TCP.. roundtrips for starting SYN/ACK, for most DNS
> requests having the connection remain in slow-start phase, etc.) As an
> example, with a fragment datagram max size of 512, replies could
> traverse a firewall that blocked large replies.
>
> This scheme should be backwards compatible with (ignored by) existing
> implementations. Client implementations of this scheme can also signal
> support with FRAGMENT 0 0.

i'd like to see this coupled to the cookie proposal, so that if cookies
aren't used, then this option is not available.

i'm in moderate support.


-- 
Paul Vixie
_______________________________________________
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

Reply via email to