Re: DCTCP for FreeBSD

George Neville-Neil Tue, 18 Mar 2014 20:31:26 -0700

On Feb 19, 2014, at 4:18 , Eggert, Lars <l...@netapp.com> wrote:

> Hi,
> 
> Midori Kato has implemented Microsoft's/Stanford's Datacenter TCP (DCTCP) for 
> FreeBSD as part of her MS thesis with me. Find a patch attached.
>


Thanks!  Any hints on how best to test this code?

Best,
George

> Also note that we're documenting a specification for DCTCP in an IETF draft: 
> http://tools.ietf.org/html/draft-bensley-tcpm-dctcp
> 
> Microsoft has made a licensing statement (RAND-Z) on the technology to the 
> IETF: https://datatracker.ietf.org/ipr/2319/ (I'm not sure what this means 
> for an eventual inclusion in FreeBSD.)
> 
> Roughly, Midori's patch consists of an extension of the modular congestion 
> control framework to expose ECN information to the modules, a module to 
> implement DCTCP, and a few experimental variants. See Midori's explanation:
> 
>> [1] A change for the modular congestion control framework (See Section 4.1 
>> if needed)
>> DCTCP uses the difference ECN processing from RFC3168. We need to prepare 
>> three functions to do the following ECN processing. 
>> a) The kernel decides whether an ECE flag should be set in the next outgoing 
>> TCP segment by snooping reserved bits in IP and TCP headers. (tcp_input.c)
>> b) The kernel controls a congestion if an ECE flag is set in an arriving TCP 
>> segment. (tcp_input.c)
>> c) After the outgoing TCP segment is generated, the kernel decides whether 
>> an ECT bit should be set in an ECN field of IP header in the outgoing 
>> packet. (tcp_output.c)
>> The current framework has no housekeeping functions for (a) and (b). 
>> Therefore, I add two functions into the moduler cc framework: 
>> ecnpkt_handler() and ect_handler().
>> 
>> - ecnpkt_handler() allows the kernel to do the additional ECN processing by 
>> snooping ECN field in IP and TCP headers. As an option, this function takes 
>> a flag, which tells whether this function is in the delayed ACK. This 
>> function returns an integer value. When the return value is set, the kernel 
>> force to disable delayed ACK.
>> - ect_handler() allows the kernel to use different rule from RFC3168 in 
>> terms of an ECT marking in the outgoing segment. This function returns an 
>> integer value. If the value is set, an ECT bit is set to the outgoing 
>> segment.
>> 
>> 
>> [2] Five changes from the original DCTCP algorithm
>> In order to reflect the DCTCP motivation, I modified the following 
>> processing. First four modifications are for senders and the last 
>> modification is for receivers.
>> 
>> (1) no congestion recovery in the receipt of ECE flags (See section 4.2.1 if 
>> needed)
>> FreeBSD handles ECN as a congestion event but it's not true for DCTCP 
>> senders. A DCTCP sender uses ECN as a means to understand the extent of 
>> congestions. Therefore, I remove congestion recovery mode in any situation 
>> for DCTCP senders.
>> 
>> (2) selective initial alpha value (See section 4.2.2 if needed) 
>> DCTCP defines alpha as a parameter to see the depth of a congestion. When 
>> the alpha value is large, it allows a saw-toothed CWND behavior to a DCTCP 
>> sender.
>> A problem is that the alpha value is not reliable during a dozen of RTTs 
>> because there is no way to identify the depth of a congestion over a network 
>> from the beginning. When considering the alpha reliability, I think the 
>> initial alpha should be selective for applications by users. When a user 
>> chooses DCTCP for latency-sensitive applications, the initial alpha is 
>> preferred. Otherwise, DCTCP senders had better to set the initial alpha 
>> value to zero from my experimental result (See section 7.2 of attaching 
>> file).
>> The default alpha value is set to zero in my implementation.
>> 
>> (3) alpha value initialization after an idle period (See section 4.2.3 if 
>> needed)
>> How long an idle period is no longer predictable. Therefore, for a DCTCP 
>> sender, using the out-dated alpha after an idle period is not good idea. A 
>> DCTCP sender resets alpha to the initial value when an idle period occurs.
>> 
>> The following changes is applied to eliminate a compatibility issue to 
>> standard ECN defined in RFC3465. DCTCP and standard ECN servers have no way 
>> to identify which mechanism is working on the peer. Thus, we need to 
>> eliminate the worst situation in a network mixing DCTCP senders/receivers 
>> and standard ECN senders/receivers.
>> (4) using CWR flag when the ECE flag is found for a RTT (See section 5.1 if 
>> needed)
>> This change is applied for a situation when a sender uses DCTCP and a 
>> reciever uses standard ECN. 
>> Under the situation, I find that a DCTCP sender minimizes CWND. The detailed 
>> technical reason is described in section 4.2 of my attaching file. 
>> Fortunately, the current tcp_input()  function complement this change, thus, 
>> there is no modification in my patch.
>> 
>> (5) enabling delayed ACK in the receipt of the CWR flag (See section 5.2 if 
>> needed)
>> This change is applied for a situation when a sender uses standard ECN and a 
>> reciever uses DCTCP. Under the situation, I find that a standard ECN sender 
>> increases smaller CWND than expected without this change. The detailed 
>> technical reason is described in section 5.2 of my attaching file.
> 
> 
> The patch is attached and should apply to a recent -CURRENT. Midori's thesis 
> (which she refers to in the quoted text above) is at 
> https://eggert.org/students/kato-thesis.pdf
> 
> Lars
> 
> <dctcp.patch>

signature.asc
Description: Message signed with OpenPGP using GPGMail

Re: DCTCP for FreeBSD

Reply via email to