On Tue, 29 May 2007, Stephen Hemminger wrote: > On Tue, 29 May 2007 20:23:45 +0200 > "Lior Dotan" <[EMAIL PROTECTED]> wrote: > > > NTP was not running. I'm not sure what do you mean by fixing the -1. > > The trace shows that vegas_cong_avoid() is called with -1, and the > > only way it can happen is from tcp_clean_rtx_queue() and the patch > > should eliminate this. Another way of solving this is by checking > > vegas_rtt_calc() and see if it gets -1 and handle it there. > > Another thing that I don't understand is that some places like > > tcp_ack() declare seq_rtt as signed and some vegas_cong_avoid() > > declare it as unsigned. Shouldn't it be declared always as signed? > > > > On 5/29/07, Stephen Hemminger <[EMAIL PROTECTED]> wrote: > > > On Tue, 29 May 2007 12:18:19 +0300 > > > "Lior Dotan" <[EMAIL PROTECTED]> wrote: > > > > > > > Hi, > > > > > > > > I had a divide by zero on kernel 2.4.33 running with Vegas enabled. > > I don't think anyone has backported the other vegas fixes from 2.6.21 > to 2.4. > > > For 2.6.22, rtt_calc doesn't exist, instead pkts_acked is called.
I tried to figure this one out yesterday, and it seems to me that divide by zero should not occur with 2.6.22 (nor with 2.6.21 code), no matter how I try to approach vegas... > The following should be added to avoid getting bogus timestamp values > from retransmits. ...but this is unreliable timestamps problem is a real one that originates from API merge commit 164891aadf1721fca4dce473bb0e0998181537c6 of yours (see some though about it below). I suppose there are now similar concerns about timestamp validity in other cc modules than vegas too. > --- a/net/ipv4/tcp_vegas.c 2007-05-02 12:26:35.000000000 -0700 > +++ b/net/ipv4/tcp_vegas.c 2007-05-29 14:06:26.000000000 -0700 > @@ -117,6 +117,10 @@ void tcp_vegas_pkts_acked(struct sock *s > struct vegas *vegas = inet_csk_ca(sk); > u32 vrtt; > > + /* Ignore retransmits */ > + if (unlikely(cnt == 0)) > + return; > + > /* Never allow zero rtt or baseRTT */ > vrtt = ktime_to_us(net_timedelta(last)) + 1; ...I don't think this works, because cnt won't be zero ever because to make the call some skbs were cleaned from the queue (isn't that what pkts_acked stands for?). There is as if (acked&FLAG_ACKED) guard before making the pkts_acked call to cc modules, thus delta in packets_out will always be greater than 0. Hmm, now that I realize this, I would say that checks for > 0 in pkts_acked are entirely bogus (in the current code), hmm, that means I have more things to cleanup in tcp-2.6, at least, bic, cubic and westwood do them... :-). I think the code did a right thing before your api merge, since it called rtt callback only if FLAG_RETRANS_DATA_ACKED was not set (and pkts_acked always), i.e., when TCP cannot know if it was the rexmission that triggered the cumulative ACK or any original transmission, no new RTT measurement should be made. Consider also the fact that, there might be a non rexmitted skb after rexmitted one, which Lior's patch doesn't do right way at all since the FLAG_DATA_ACKED would again get set (no div-by-zero would have occured then but an unreliable timestamp would have been givento vegas in 2.4). The number of packets that got cumulatively acked is never a right metric to measure whether the cumulative ACK originated from rexmitted skbs or not. ...Perhaps the FLAG_RETRANS_DATA_ACKED flag needs to be passed somehow to cc modules? -- i. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html