Hi Folks, I wanted to check back in on this for another update and to solicit some more suggestions. I did a bit more digging to try an isolate the problem.
As I explained earlier, the log generated by tcp_probe indicates that the snd_cwnd is set to 1 just before the end host receives an ECN marked ACK and unexpectedly enters a timeout ( https://drive.google.com/open?id=1iyt8PvBxQga2jpRpBJ8KdQw3Q_mPTzZF ). I was trying to track down where this is happening, but the only place I could find that might be setting the snd_cwnd to 1 is in the tcp_enter_loss() function. I inserted a printk() call in this function to see when it is being invoked and it looks like it is only called by the tcp_retransmit_timer() function after the timer expires. I decided to try recording the snd_cwnd, ss-thresh, and icsk_ca_state inside the tcp_fastretrans_alert() function whenever it processes an ECN marked ACK ( https://drive.google.com/open?id=17GD77lb9lkCSu0_s9p40GZ5r4EU8B4VB ) This plot also shows when the tcp_retransmit_timer() and tcp_enter_loss() functions are invoked (red and purple dots respectively). And I see that the ACK state machine is always either in the TCP_CA_Open or TCP_CA_CWR state whenever the tcp_fastretrans_alert() function processes ECN marked ACKs ( https://drive.google.com/open?id=1xwuPxjgwriT9DSblFx2uILfQ95Fy-Eqq ). So I'm not sure where the snd_cwnd is being set to 1 (or possibly 0 as Neal suggested) just before entering a timeout. Any suggestions here? In order to do a bit of profiling of the tcp_dctcp code I added support into tcp_probe for recording the dctcp alpha parameter. I see that alpha oscillates around about 0.1 when the flow rates have converged, it goes to zero when the other host enters a timeout, and I don't see any unexpected behavior just before the timeout ( https://drive.google.com/open?id=1zPdyS57TrUYZIekbid9p1UNyraLYrdw7 ). So I haven't had much luck yet trying to track down where the problem is. If you have any suggestions that would help me to focus my search efforts, I would appreciate the comments. Thanks! -Steve On Mon, Nov 6, 2017 at 3:31 PM, Steve Ibanez <siba...@stanford.edu> wrote: > Hi Daniel, > > Apologies for the delay. I tried out Neal's suggestion to printk the > cwnd and ss-thresh in the tcp_init_cwnd_reduction(), > tcp_cwnd_reduction(), and tcp_end_cwnd_reduction() functions in > tcp_input.c. From what I can tell, none of these functions are setting > the cwnd to 0. > > Here is the kernel log with the cwnd and ss-thresh print statements: > https://drive.google.com/open?id=1LEWIkz64NuZN3yuDpBOAXbUfJfiju55O > And here is the corresponding packet trace at this end host: > https://drive.google.com/open?id=1qf4cSW3wzsiwPngcYpZY-AoBspuqONLH > (The kernel log buffer was not large enough to capture the full > 3-second experiment, so there's only about a second of data for the > log buffer and about 3 seconds for the packet trace.) > > Here are is a plot of the cwnd and ss-thresh from each of the three functions: > - tcp_init_cwnd_reduction: > https://drive.google.com/open?id=1KOEXG2ISJQMi9c6KyPOQ6rpVUVsQwtWU > - tcp_cwnd_reduction: > https://drive.google.com/open?id=1awoPWC3hi4CGZt7HyuI4aAaLG1LPLwJE > - tcp_end_cwnd_reduction: > https://drive.google.com/open?id=1G7XUSnkX8tP7Z5XdY2O97OWj6jguQHO5 > > Here is a plot of the measured flow rates: > https://drive.google.com/open?id=1XwmGve10J4qa1nPE3LustK8NbvhZscac > > The kernel log and packet trace data was collected on the 10.0.0.3 > host. The cwnd and ss-thresh plots are from the final second or so of > the experiment and they show two timeout events. In the first event, > the 10.0.0.1 host times out allowing 10.0.0.3 to increase it's cwnd. > And in the second event, the 10.0.0.3 host times out causing the cwnd > to decrease from ~100 to about ~10. The cwnd samples from tcp_probe ( > https://drive.google.com/open?id=1QCuPspLqbGoA68MKTaAh7rx2wCv3Cr_e ) > indicate that the cwnd is 1 MSS just before the timeout event, but I > don't see that in the data collected from the tcp_*_cwnd_reduction > functions. > > Here is a diff of the changes that I applied to the tcp_input.c file: > https://drive.google.com/open?id=1k5x3AkfTr3tJhohSIcmQp-3g2yTVNMWm > > Are there other places in the code that you would suggest I check for > how the cwnd and ss-thresh are changing? > > Thanks, > -Steve > > > On Mon, Nov 6, 2017 at 6:08 AM, Daniel Borkmann <dan...@iogearbox.net> wrote: >> On 10/24/2017 03:11 AM, Neal Cardwell wrote: >>> >>> On Mon, Oct 23, 2017 at 6:15 PM, Steve Ibanez <siba...@stanford.edu> >>> wrote: >>>> >>>> Hi All, >>>> >>>> I upgraded the kernel on all of our machines to Linux >>>> 4.13.8-041308-lowlatency. However, I'm still observing the same >>>> behavior where the source enters a timeout when the CWND=1MSS and it >>>> receives ECN marks. >>>> >>>> Here are the measured flow rates: >>>> >>>> <https://drive.google.com/file/d/0B-bt9QS-C3ONT0VXMUt6WHhKREE/view?usp=sharing> >>>> >>>> Here are snapshots of the packet traces at the sources when they both >>>> enter a timeout at t=1.6sec: >>>> >>>> 10.0.0.1 timeout event: >>>> >>>> <https://drive.google.com/file/d/0B-bt9QS-C3ONcl9WRnRPazg2ems/view?usp=sharing> >>>> >>>> 10.0.0.3 timeout event: >>>> >>>> <https://drive.google.com/file/d/0B-bt9QS-C3ONeDlxRjNXa0VzWm8/view?usp=sharing> >>>> >>>> Both still essentially follow the same sequence of events that I >>>> mentioned earlier: >>>> (1) receives an ACK for byte XYZ with the ECN flag set >>>> (2) stops sending for RTO_min=300ms >>>> (3) sends a retransmission for byte XYZ >>>> >>>> The cwnd samples reported by tcp_probe still indicate that the sources >>>> are reacting to the ECN marks more than once per window. Here are the >>>> cwnd samples at the same timeout event mentioned above: >>>> >>>> <https://drive.google.com/file/d/0B-bt9QS-C3ONdEZQdktpaW5JUm8/view?usp=sharing> >>>> >>>> Let me know if there is anything else you think I should try. >>> >>> >>> Sounds like perhaps cwnd is being set to 0 somewhere in this DCTCP >>> scenario. Would you be able to add printk statements in >>> tcp_init_cwnd_reduction(), tcp_cwnd_reduction(), and >>> tcp_end_cwnd_reduction(), printing the IP:port, tp->snd_cwnd, and >>> tp->snd_ssthresh? >>> >>> Based on the output you may be able to figure out where cwnd is being >>> set to zero. If not, could you please post the printk output and >>> tcpdump traces (.pcap, headers-only is fine) from your tests? >> >> >> Hi Steve, do you have any updates on your debugging? >> >>> thanks, >>> neal >>> >>