thx for such a comprehensive review! That is the most in-depth critique we've had in a long time.
Some comments inline. On Wed, Mar 9, 2016 at 9:12 AM, Elwyn Davies <[email protected]> wrote: > I am the assigned Gen-ART reviewer for this draft. The General Area > Review Team (Gen-ART) reviews all IETF documents being processed > by the IESG for the IETF Chair. Please treat these comments just > like any other last call comments. > > For more information, please see the FAQ at > > <http://wiki.tools.ietf.org/area/gen/trac/wiki/GenArtfaq>. > > Document: draft-ietf-aqm-fq-codel-05.txt > Reviewer: Elwyn Davies > Review Date: 2016/03/06 > IETF LC End Date: 2016/03/17 > IESG Telechat date: (if known) 2016/03/17 > > Summary: > Almost ready. There are a couple of minor issues that are barely above the > level of nits plus a fair bit of editorial work. > > I notice that the draft is on the IESG agenda on the day that last call > ends. If the authors wish to sort out the nits before the end of last call, > I will be happy to work with them on this. Would be good. > Major issues: > None. > > Minor issues: > Treatment of packets that don't fit into the hashing classification scheme: > The default FQ-CoDel hashing mechanism uses the protocol/addresses/ports > 5-tuple, but there will be packets that don't fit this scheme (especially > ICMP). There is no mention of what the classification would do with these > packets. How about? The default hashing algorithm SHOULD make a best effort attempt at consistently decoding a flow for any given protocol. It MUST fall back to a 3 tuple for anything not specifically recognized (srcip,dstip,protocol), by default. ... Arp is essentially the only exception to this. (LLC or other obscure wire protocols? IPX?) Due to the permutation in the hash ARP will always end up in a single, consistent bucket (not necessarily 0) - it is so small and so infrequent that normally fq_codel will attempt to push it out first except in the case of a hash collision. If you are in a situation where you are sending a ton of arp requests, and codel is dropping them... you've got other problems. > I guess that one extra queue would probably suffice to hold any > such outliers, but it would be wise to say something about how the packets > from this/these queue(s) would be treated by the scheduler. It might also > be useful to say something about treatment of outliers in other > classification schemes, if only to say that the scheme used needs to think > about any such outliers. OK, but see above. > > s6.2, last para: The proposal to add flow related marking to encapsulated > packets needs to come with a very well exposed security and privacy health > warning.. [It occurs to me that it might be possible to confine these > markings to out of band notifications within the originating host which > would allow fq_codel or similar to Flow Queue the packets getting them into > a sensible order on the outgoing link. Once the packets had been ordered in > this way, a subsequent box (maybe an ADSL modem or suchlike) linking a home > environment to the outside world could work purely on source address, > thereby interspersing the packets from different hosts onto the external > link but not needing to reorder the packets from each individual host. Not > sure if this is a sensible idea but it looks like a way to avoid the privacy > issues.] Obscuration happens a couple ways at the gateway - in linux the hashing happens after the nat step in ipv4 typically, so the src ip is the constant natted address, and we rely on the different ports, protocol, and dest addr to distinguish the hash. In ipv6, it is end to end, but in both cases the actual hash is permuted by a random number. You can certainly see, from a packet capture off of a saturated link that the traffic is being FQ'd. (and if ecn is enabled on the flows and codel, when that is being applied) but I am not sure what this actually "leaks" in terms of information. Note that saturation does happen on tiny scales also - microbursts will get broken up also on a nonsaturated link (example a typical TCP dumps a whole IW10 via TSO on a wire at a gbit, then sent at 5mbit over a cable modem) I think, in trying to read what you wrote above, you are talking about applying additional classification on a multi-band system like what sqm-scripts does? Or applying (as many products do) some sort of classification to raise the prio of voip or vpn packets and lower it for things like torrent and backups? This can get done several ways - We used dscp markings in cerowrt, as that research was driven by trying to find rational e2e classifications for multiple kinds of traffic (something that the dart wg took on also, and cake has tried to implement), but aside from a very few exceptions (vpn,dns) we found just fq'ing with short queue to be superior approach. A method more common (and lightweight) than touching the packet's dscp is actually to use the native priorities built into the packet classification system in linux to vector things into the right tier. > > s11: Internet Drafts are not scientific papers as such and do not usually > have a Conclusions section. I think it would be useful to move the > paragraph in s11 as is to s1. Since this draft is targeted for Experimental > RFC status, it would be useful to suggest (maybe in s7) that experiments > along the lines noted in s7 might be carried out and there results reported > (where? AQM WG?). > Given the developments with Cake, I am not sure whether > the authors expect this version of FQ-CoDel to make it onto standards track > or BCP, but to set out some sort of expected trajectory is desirable. Darned if I know stds track or bcp. ask again in 3 years? fq_codel's achieved wide deployment in Linux with no major issues reported, the minor ones fixed over time.... the BSD implementation is landing, which exposed some new potential issues (granularity of timestamps) The same ideas are being tried in wifi. (the differences between switched and shared mediums is proving important). Other AQMs (docsis-pie) are entering the field. I do not expect to see cake enter ietf processes, except as an experimental results to iccrg or the aqm wgs. (and if we want to re-open the wounds of dart, dart) The code is gpl/bsd dual licensed and sufficiently complex to not want to reduce it to plain text. And it's still out of tree, which means it will take years to see deployment. I've long thought that summoning the energy to ietf-standardize BQL ( https://lwn.net/Articles/469652/ ) with some document that also talks about the negative effects of tcp packet offloads would be a good idea. > > Nits/editorial comments: > General: s/i.e./i.e.,/, s/e.g./e.g.,/ > > Draft Title: The acronym CoDel with an uppercase D is used everywhere else > in the document - Suggest the title should be FlowQueue-CoDel K. > Abstract: Need to expand AQM. [DNS and and CPU are 'well-known' - > http://www.rfc-editor.org/materials/abbrev.expansion.txt] > > General: It would be helpful to capitalize Quantum throughout (or at least > from s3 onwards) to emphasise that it is a configured value. Likewise > Interval and Target parameters. Maybe also Flow and Queue as they a defined > terms. We'd used _param_ fairly consistently throughout, I thought. Not the right thing? > > s1, para 1: It would be helpful to give the full title (FlowQueue-CoDel), > expand AQM (again), provide a refernece explaining the term bufferbloat and > give references for AQM, basic CoDel, DRR and the ns2/ns3 network > simulators. Oy. > I would think one or both of the following would be useful (long term > stable) refs for bufferbloat (the first is useful because the article is > publically available rather than needing a sub to IEEE or ACM): > > Jim Gettys. 2011. Bufferbloat: Dark buffers in the Internet. IEEE Internet > Comput. 15, 3 (2011), 95–96. DOI:http://dx.doi.org/10.1109/MIC.2011.56 > (freely available at > http://www.bufferbloat.net/attachments/27/IC-15-03-Backspace.pdf) > > and/or > Jim Gettys and Kathleen Nichols. 2012. Bufferbloat:Dark buffers in the > Internet. Commun. ACM 55, 1 (2012), 1–15. > DOI:http://dx.doi.org/10.1145/2063176.2063196 k. > > Suggest: > OLD: > The FQ-CoDel algorithm is a combined packet scheduler and AQM > developed as part of the bufferbloat-fighting community effort. It > is based on a modified Deficit Round Robin (DRR) queue scheduler, > with the CoDel AQM algorithm operating on each queue. This document > describes the combined algorithm; reference implementations are > available for ns2 and ns3 and it is included in the mainline Linux > kernel as the fq_codel queueing discipline. > > NEW: > The FlowQueue-CoDel (FQ-CoDel) algorithm is a combined packet scheduler > and > Active Queue Management (AQM) [RFC 3168] algorithm > developed as part of the bufferbloat-fighting community effort (see > http://www.bufferbloat.net). It > is based on a modified Deficit Round Robin (DRR) queue scheduler > [DRR][DRRPP], > with the CoDel AQM If we are going to cite these other things, cite [CODELDRAFT] here? > algorithm operating on each queue. This document > describes the combined algorithm; reference implementations are > available for the ns2 (http://nsnam.sourceforge.net/wiki) and > ns3 (https://www.nsnam.org/wiki) network simulators, and it is > included in the mainline Linux kernel as the fq_codel queueing > discipline. > END Sure. I note (sadly) that the ns3 version is still not in the mainline ns3 code due to issues with exposing the header properly and other integration issues. But shouldn't the refs above be [NS2], [NS3], and then referred to as links in the normative references ratehr than inline? > > s1.2, definition of Flow: s/protocol/protocol number/; also mac is ambiguous > - s/mac address/media access control (MAC) address/ K. > s1.2, definitions of Flow and Queue: To make the mentions of hash in the > definition of Queue comprehensible, it would be sensible to mention 'hash' > in the definition of flow. Suggest adding: > The identifier(s) of a flow are mapped to a hash code used internally in > FQ-CoDel to organize the packets into Queues. K. > > s1.2: Probably worthwhile adding the references for CoDel and DRR to the > definitions. > > s1.2: It would help with subsequent understanding to provide definitions of > the Interval and Target parameters of the CoD > el scheme. Suggest: > ADD: > Interval: Characteristic time period used by the control loop of CoDel to > age the value of the minimum sojourn time of packets in a Queue (i.e., the > time spent by the packet in the local Queue) that is used to indicate when a > persistent Queue is developing (see Section 4.3 of [CODELDRAFT]). An issue with final acceptance is that the codel draft is not yet approved, either. > Target: Setpoint value of the minimum sojourn time of packets in a Queue > used as the target of the control loop in CoDel (see Section 4.4 of > [CODELDRAFT]). > END > > s1.3, para 1: It would be helpful to expand SQF: s/SQF/Shortest Queue First > (SQF)/ > > s1.3, para 1: s/differently than/differently from/ (the latter is good for > both US and British English) > > s1.3, para 2: Suggest starting the para with 'By default,' so that the > 'although' bit is not a surprise. > > s1.3, para 2: Adding a reference for the CoDel algorithm would help. In > particular, linking to http://queue.acm.org/appendices/codel.html as well as > the basic article would be helpful - I am not sure if the CoDel article is > available for free to all but the appendix seems to be. > > s1.3, para 4: s/current implementation/Linux implementation at the time of > writing/ > > s1.3, para 5: s/rather than a packet-based./rather than packet-based./ > > s1.3, para 6 (next to last para): K. > However, in practice many things that are > beneficial to have prioritised for typical internet use (ACKs, DNS > lookups, interactive SSH, HTTP requests, ARP, RA, ICMP, VoIP) _tend_ > to fall in this category, which is why FQ-CoDel performs so well for > many practical applications. > > I am doubtful as whether mentioning ARP, RA and ICMP in this statement is > appropriate since they don't fit into the default flow classification > scheme, and are not really flows as such. RA and ICMP are hashed on the 3-tuple. Don't have a problem with dropping these from the text, although frequently icmp could be considered as a flow. The point tho was that arp,ra,icmp do land generally in their own buckets and thus are "prioritized". The language here was also intended to deprecate the word "priority" per se' as it's more commonly used for diffserv - sparseness, fq, short queues are what works here. > As mentioned in Minor Issues > above some discussion of non-flow packets is needed IMO. They could be left > out here without problem I think. > > s3.1, para 1: s/buckets are configurable/buckets is configurable/ > > s3.1, para 3: s/This number is is/This number is/ > > s3.1, para 3: > OLD: > until the number of credits runs into the negative > NEW: > until the value of _byte credit_ becomes negative I tend to prefer the language matches the code. > END > What about _byte_credit_ == 0? (Just asking! Did possibly think that if > the queue has the same number of bytes as the credit, it might be possible > for the queue to jam in the active list while empty.) You worry me. I'll go look. > s3.1,para 4: s/fairness queueing/fairness queueing scheme/ > > s3.1, last para: s/flow-building queues/queues that build a backlog/ > > s4.1, para 2: A reference for Jenkins hash functions would be desirable. It > seems the best we can probably do is Jenkins' web site: > http://www.burtleburtle.net/bob/hash/doobs.html k. > s4.1: The basic Jenkins has used in the Linux scheduler code appears to be > targeted at IPv4 32 bit addresses. Technically, the abstract definition > doesn't need to worry about the lengths of addresses, but does anything > special need to be said about IPv6? I couldn't see in the Linux code where > IPv6 is dealt with. Furthermore, looking at the code a bit more closely, I > am not sure how the 112 bits (32 + 32 + 16 + 16 + 16) of the 5-tuple are > pushed into the hashing algorithm which has an input 'bandwidth' of 64 bits. > Again this doesn't desperately matter here but, if I understand correctly, > the calculations in s5.3 of hash collision probabilities reported in para 2 > of s5.3 relate to the Jenkins update3 has algorithm used in Linux. This > isn't explicitly stated. I am not sure if the probabilities would be > different if the number of bits in the 5-tuple was greater - presumably this > depends to some extent on how the Jenkins algorithms are used. The core > Linux algorithm just uses the 'final' part of the algorithm. According to > the Jenkins comments, one probably ought to use the 'mix' part to combine > additional bits. Thoughts? I will talk to the hashing issue in another message. It's evolved quite a bit, please take a look at the new hashing api that landed in linux 4.3 and later. > > s4.1, para 2: I am not sure that 'permuted by a random value' is very > clear, and it doesn't explain why this is done. > How about 'salted by modular addition of a random value'? It would also be > helpful to note here that this is done to mitigate a possible DoS attack > that could occur if the hash is externally predictable and point to the note > about this in the Security Considerations (s8). k. > s4.1.1, para 1: s/mac address/MAC address/; s/diffserv/diffserve codepoint > values/ > [I am not sure what firewall specific markings might be - any references?] > > s4.1.1, para 2: Suggest: > OLD: > An alternative to changing the classification scheme is to perform > decapsulation prior to hashing. > NEW: > An addition to the default classification scheme is to perform > decapsulation of tunnelled packets prior to hashing on the 5-tuple in the > encapsulated. > END > > Figure in s4.2: Needs a caption and a Figure number. It is also somewhat > incomplete as a state diagram. Both New and Old states have actions that > result from both Arrival/Enqueue and Dequeue events, and there are > 'loopback' actions when there are Arrival/Enqueue and Dequeue events that > occur in both New and Old states when the conditions are not satisfied. You are the first person to comment on that diagram ever!!! > s5.2.1, para 1: > > to ensure that the measured minimum delay does not become too stale. > > Without detailed knowledge of CoDel this didn't read very easily - it is > taken out of context from the CoDel draft. > Suggest: > to ensure that the minimum sojourn time of packets in a queue used as an > estimator by the CoDel control algorithm is a relatively up-to-date value. > > s5.2.1, para 1: s/It SHOULD be set on the order/It SHOULD be set to be on > the order/ "the order" implies a much less tight relationship that what I'd like, btw. > s5.2.3, para 2: s/10GigE/10 Gigabit Ethernet/ > > s5.2.4, para 2: Expand TSO (TCP Segmentation Offload). TSO probably could do > with a reference (Freimuth below should work). > > s5.2.4, para 2: I think GRO-enabled should probably be GSO-enabled - > whichever it is, it needs to be expanded (Generic Segmentation Offload?). > There is also a web page for GSO (https://lwn.net/Articles/188489/) GRO traffic is actually been one of our bigger headaches on several of the deployed routers out there - the edgerouter, the latest linksys 1200ac (armada 385) (gro peeling is solved in cake) Your typical cheapo router recieves traffic in a burst, GRO's it, then tries to send it as one packet. 64k is often seen... dropping the whole thing is catastrophic, splitting it back up GSO/TSO are endpoint generated problems, GRO a router specific one. I strongly suspect some CMTSes and other high end gear are also cheating by bunching up packets into larger entities. > s5.2.5, para 2 and s4.1, para 2: It would be sensible to use a common term > for 'initialisation time' and 'load time'. > > s5.2.6, title and body: Need to expand ECN on first use. Also s/thus > unresponsive/thus the number of unresponsive/ > > s5.2.7: Expand CE (Congestion Encountered) on first occurrence. > > s5.2.7, para 1: Expand DCTCP acronym and provide reference [Alizadeh] below > and/or [draft-ietf-tcpm-dctcp]....thus.. > OLD: > > This parameter enables DCTCP-like processing to enable CE marking ECT > packets at a lower setpoint than the default codel target. > > NEW: > > This parameter enables Date Centre TCP (DCTCP)-like processing resulting > in > CE (Congestion Encountered) marking on ECN-Capable Transport (ECT) > packets [RFC3168] starting at a lower sojourn delay setpoint than the > default CoDel > Target. Details of DCTCP can be found in [Alizadeh2011] and > [I-D.draft-ietf-tcpm-dctcp]. > > END > > s5.3, para 2: see the comments on s4.1 above regarding the link between > probabilities and address lengths. > > s5.3, para 3: Need to expand Cake acronym and providepointer to Section 7 > which expalisn about Cake. > > s5.4: Is it possible to add some comments to the codel_vars structure? > > s5.5, para 2: s/resolution below the target/resolution significantly finer > than the CoDel Target setting/ > > s5.6: Need to expand SFQ, WFQ and QFQ. There is a reference for SFQ below > [McKenney] We definately talk to ourselves too much. At the same time, the acronym expansion is adding pages to this doc. > s5.7, last para: s/doesn't miss a beat/reacts almost immediately/ (Original > is rather too colloquial!) yes. > s6.1, last para: s/classification/service classification/ > > s7: I might add 'Checking behaviour when two or more instantiations of > FQ-CoDel are used in series.' > > s12.2: Various of the informative references could do with completion. I > think the referecnes below are correct: > > Kathleen Nichols and Van Jacobson. 2012. "Controlling Queue Delay". Queue > 10, 5 (May 2012), 20. DOI:http://dx.doi.org/10.1145/2208917.2209336 > > M. Shreedhar and George Varghese. 1996. "Efficient fair queuing using > deficit round-robin". IEEE/ACM Trans. Netw. 4, 3 (June 1996), 375–385. > DOI:http://dx.doi.org/10.1109/90.502236 > > M.H. MacGregor and W. Shi. 2000. "Deficits for bursty latency-critical > flows: DRR++". In Proceedings IEEE International Conference on Networks 2000 > (ICON 2000). Networking Trends and Challenges in the New Millennium. IEEE > Comput. Soc, 287–293. DOI:http://dx.doi.org/10.1109/ICON.2000.875803 > > Y. Gong, D. Rossi, C. Testa, S. Valenti, and M.D. Taht. 2013. "Fighting the > bufferbloat: On the coexistence of AQM and low priority congestion control". > In 2013 IEEE Conference on Computer Communications Workshops (INFOCOM > WKSHPS). IEEE, 411–416. DOI:http://dx.doi.org/10.1109/INFCOMW.2013.6562885 > > Giovanna Carofiglio and Luca Muscariello. 2012. "On the Impact of TCP and > Per-Flow Scheduling on Internet Performance". IEEE/ACM Trans. Netw. 20, 2 > (April 2012), 620–633. DOI:http://dx.doi.org/10.1109/TNET.2011.2164553 > > The following are extra references that I suggested above: > > P.E. McKenney. 1990. Stochastic fairness queueing. In Proceedings. IEEE > INFOCOM ’90: Ninth Annual Joint Conference of the IEEE Computer and > Communications Societies@m_The Multiple Facets of Integration. IEEE Comput. > Soc. Press, 733–740. DOI:http://dx.doi.org/10.1109/INFCOM.1990.91316 I think this was the short version. We reference the long version as that's where a ton of new ideas came from. > Doug Freimuth et al. 2005. Server network scalability and TCP offload. In > USENIX Annual Technical Conference. USENIX Association, 209–222. > > Mohammad Alizadeh et al. 2011. Data center TCP (DCTCP). ACM SIGCOMM Comput. > Commun. Rev. 41, 4 (October 2011), 63–74. > DOI:http://dx.doi.org/10.1145/2043164.1851192 > > _______________________________________________ Gen-art mailing list [email protected] https://www.ietf.org/mailman/listinfo/gen-art
