[DNSOP]Re: draft-ietf-dnsop-avoid-fragmentation-17.txt - implementer notes

Paul Wouters Tue, 21 May 2024 10:51:05 -0700

On Mon, 6 May 2024, Petr Špaček wrote:

 R1. UDP responders SHOULD NOT use IPv6 fragmentation [RFC8200].


Operational impact of this recommendation is unclear.

Why? Because clients belong to several sets:
- One set clients cannot receive fragmented answers,


Good because it has been proven to be very insecure.

- another set of clients cannot use TCP to overcome unfragmented UDP sizelimitations,


TCP is a mandatory part of DNS now, so I'm not sure how much sympathy I
would have. If I were a flagday person, I'd call a flagday for this :P

- yet another set of clients actually depend on large answers to function(say because they DNSSEC validate, or need to follow huge NS sets geneated byMS AD, or they need large RRs to deliver e-mail, or ...).


You mean, those exact records with value to attack using DNS fragments.
Is the right operational concern to keep them vulnerable instead of
breaking them to fix it to avoid a security issue? Why wait for a
specific attack to come out before giving up on these dangerously broken
clients?

It's unclear what proportion of clients belong to intersection of these threesets. Banning fragmentation on the **outgoing** side might break theseclients, and it's extremely hard to measure and debug from the server side.


Breaking them _also_ ensures they can't be victim of fragmentation attacks.

 R2. Where supported, UDP responders SHOULD set IP "Don't Fragment flag
 (DF) bit" [RFC0791] on IPv4. At the time of writing, most DNS server
 software did not set the DF bit for IPv4, and many operating systems'
 kernels constraint make it difficult to set the DF bit in all cases.
E.g. on Linux socket API does not expose DF bit directly. Application canrequest DF bit to be turned on in outgoing packets but at the same time thisimplicitly enables receipt and processing of unauthenticated ICMP messages.These messages can be used to manipulate Path MTU records in the kernel andmount attacks misusing this technique.


That's clear, and someone should take this up with the linux-net people?

 R3. UDP responders SHOULD compose response packets that fit in the minimum
 of the offered requestor's maximum UDP payload size [RFC6891], the
 interface MTU, the network MTU value configured by the knowledge of the
 network operators, and the RECOMMENDED maximum DNS/UDP payload size 1400.
 (See Appendix A for more information.)
In practice doing syscall to determine MTU _estimate_ for every single peeraddress is impractical, and in most cases the value exposed by kernel is justa garbage anyway. It's more practical to assume that outgoing EDNS buffersize is configured to a reasonable lower bound by system admin.


I don't think it is asking for a syscall here is it? It is saying the
minimum of:

1) ENDS0 option value received
2) interface MTU
3) Preset network MTU by admin in config
4) 1400

Only 2) would require some syscalls but those are per interface so not
per packet, and one could listen for interface changes to reread these.

What syscalls do you think are impractical?

 R4. If the UDP responder detects an immediate error indicating that the
 UDP packet cannot be sent beyond the path MTU size, the UDP responder MAY
 recreate response packets fit in the path MTU size, or with the TC bit
 set.
Same note about MTU determination applies here. TC=1 sounds reasonable anddoes not require more guesswork or reconstructing and recompressing theanswer packet.


Once you did the above calcuation, wouldn't you just use that result?

I think you are both not saying things too different? eg you are
building the packet, know the max size (from above) and start adding
additional records, until you run out of space?
Or if you are still writing mandatory data (eg Answer or Authority
Section), you set TC=1 ?

 R5. UDP requestors SHOULD limit the requestor's maximum UDP payload size.
 It SHOULD use a limit of 1400 bytes, but a smaller limit MAY be used. (See
 Appendix A for more information.)
Some operators have better experience with 1400, others with other values. Weat ISC go with lower value of 1232 because it's easier to have conservativevalue which is more likely to work. Debugging this in production is totalpain, and using a bit smaller value is in our limited experience not causingnew issues. That's why we went with lower values.


Let the implementers pick the value. They have the most experience
dealing with support calls. I was assuming the WG discussed this at
length, but perhaps it didn't :)

 R6. UDP requestors SHOULD drop fragmented DNS/UDP responses without IP
 reassembly to avoid cache poisoning attacks.
AFAIK this is impossible to do using normal socket API. The application hasno access to information about UDP reassembly.


I imagine some userland stacks like DPDK could possibly enforce this.

Having said that, even if it was implementable it's IMHO not the best advicefor requestor.
IF the requestor is able to detect that a fragment was received then it wouldbe MUCH better to trigger retry using different protocol right away. Justdropping the packet:
a] causes timeouts
b] leaves a time window open for another attack attempt


It could drop all but the initial fragment as a signal to the
application? How about:

        If the UDP requester is protected by a packet filter capability
        in the application, on the host or on the network, that packet
        filter capability SHOULD drop all but the initial fragment for
        DNS/UDP responses and not perform any IP reassembly to avoid
        cache poisoning attacks. UDP requesters receiving an initial
        fragment SHOULD immediately retry other methods to obtain the
        full data, for example using TCP.

 R7. DNS responses may be dropped by IP fragmentation. Upon a timeout, to
 avoid resolution failures, UDP requestors SHOULD retry using TCP or UDP
 with a smaller EDNS requestor's maximum UDP payload size per local policy.
 UDP requestors SHOULD observe [RFC8961] in setting their timeout.
Problem:
There is no indication if timeout was caused by fragmentation - it might havebeen caused by other factors. The server might be simply dead.
Server selection algorithm in DNS is currently undefined and eachimplementation has it's own retry strategy. TCP might or might not be firstchoice. I don't see compelling reason why this should be prescribed.


See above? If you let the initial fragment through, there is an
indication?

Paul

_______________________________________________
DNSOP mailing list -- dnsop@ietf.org
To unsubscribe send an email to dnsop-le...@ietf.org

[DNSOP]Re: draft-ietf-dnsop-avoid-fragmentation-17.txt - implementer notes

Reply via email to