Re: [9fans] fossil+venti performance question
> Looking at the first few bytes in each dir of the initial TCP > handshake (with tcpdump) I see: > > 0x: 4500 0030 24da <= from plan9 to freebsd > > 0x: 4500 0030 d249 4000 <= from freebsd to plan9 > > Looks like FreeBSD always sets the DF (don't fragment) bit > (0x40 in byte 6), while plan9 doesn't (byte 6 is 0x00). > > May be plan9 should set the DF (don't fragment) bit in the IP > header and try to do path MTU discovery? Either by default or > under some ctl option. easy enough until one encounters devices that don't send icmp responses because it's not implemented, or somehow considered "secure" that way. - erik
Re: [9fans] fossil+venti performance question
yes, but i was not refering to the adjusting which isnt changed here. only the tcpmtu() call that got added. yes, it *should* not make any difference but maybe we'r missing something. at worst it makes the code more confusing and cause bugs in the future because one of the initializations of mss is a lie without any effect. -- cinap
Re: [9fans] fossil+venti performance question
On Fri May 8 20:12:57 PDT 2015, cinap_len...@felloff.net wrote: > do we really need to initialize tcb->mss to tcpmtu() in procsyn()? > as i see it, procsyn() is called only when tcb->state is Syn_sent, > which only should happen for client connections doing a connect, in > which case tcpsndsyn() would have initialized tcb->mss already no? i think there was a subtile reason for this, but i don't recall. a real reason for setting it here is because it makes the code easier to reason about, imo. there are a couple problems with the patch as it stands. they are inherited from previous mistakes. * the setting of tpriv->stats[Mss] is bogus. it's not shared between connections. it is also v4 only. * so, mss should be added to each tcp connection's status file. * the setting of tcb->mss in tcpincoming is not correct, tcp->mss is set by SYN, not by ACK, and may not be reset. (see snoopy below.) * the SYN-ACK needs to send the local mss, not echo the remote mss. asymmetry is "fine" in the other side, even if ip/tcp.c isn't smart enough to keep tx and rx mss seperate. (scare quotes = untested, there may be some performance niggles if the sender is sending legal packets larger than tcb->mss.) my patch to nix is below. i haven't submitted it yet. - erik --- 005319 ms ether(s=a0369f1c3af7 d=0cc47a328da4 pr=0800 ln=62) ip(s=10.1.1.8 d=10.1.1.9 id=ee54 frag= ttl=255 pr=6 ln=48) tcp(s=38903 d=17766 seq=3552109414 ack=0 fl=S win=65535 ck=d68e ln=0 opt4=(mss 1460) opt3=(wscale 4) opt=NOOP) 005320 ms ether(s=0cc47a328da4 d=a0369f1c3af7 pr=0800 ln=62) ip(s=10.1.1.9 d=10.1.1.8 id=54d3 frag= ttl=255 pr=6 ln=48) tcp(s=17766 d=38903 seq=441373010 ack=3552109415 fl=AS win=65535 ck=eadc ln=0 opt4=(mss 1460) opt3=(wscale 4) opt=NOOP) --- /n/dump/2015/0509/sys/src/nix/ip/tcp.c:491,501 - /sys/src/nix/ip/tcp.c:491,502 s = (Tcpctl*)(c->ptcl); return snprint(state, n, - "%s qin %d qout %d rq %d.%d srtt %d mdev %d sst %lud cwin %lud swin %lud>>%d rwin %lud>>%d qscale %d timer.start %d timer.count %d rerecv %d katimer.start %d katimer.count %d\n", + "%s qin %d qout %d rq %d.%d mss %d srtt %d mdev %d sst %lud cwin %lud swin %lud>>%d rwin %lud>>%d qscale %d timer.start %d timer.count %d rerecv %d katimer.start %d katimer.count %d\n", tcpstates[s->state], c->rq ? qlen(c->rq) : 0, c->wq ? qlen(c->wq) : 0, s->nreseq, s->reseqlen, + s->mss, s->srtt, s->mdev, s->ssthresh, s->cwind, s->snd.wnd, s->rcv.scale, s->rcv.wnd, s->snd.scale, s->qscale, /n/dump/2015/0509/sys/src/nix/ip/tcp.c:843,854 - /sys/src/nix/ip/tcp.c:844,857 /* mtu (- TCP + IP hdr len) of 1st hop */ static int - tcpmtu(Proto *tcp, uchar *addr, int version, uint *scale) + tcpmtu(Proto *tcp, uchar *addr, int version, uint reqmss, uint *scale) { + Tcppriv *tpriv; Ipifc *ifc; int mtu; ifc = findipifc(tcp->f, addr, 0); + tpriv = tcp->priv; switch(version){ default: case V4: /n/dump/2015/0509/sys/src/nix/ip/tcp.c:855,865 - /sys/src/nix/ip/tcp.c:858,870 mtu = DEF_MSS; if(ifc != nil) mtu = ifc->maxtu - ifc->m->hsize - (TCP4_PKT + TCP4_HDRSIZE); + tpriv->stats[Mss] = mtu; break; case V6: mtu = DEF_MSS6; if(ifc != nil) mtu = ifc->maxtu - ifc->m->hsize - (TCP6_PKT + TCP6_HDRSIZE); + tpriv->stats[Mss] = mtu + (TCP6_PKT + TCP6_HDRSIZE) - (TCP4_PKT + TCP4_HDRSIZE); break; } /* /n/dump/2015/0509/sys/src/nix/ip/tcp.c:868,873 - /sys/src/nix/ip/tcp.c:873,882 */ *scale = Defadvscale; + /* our sending max segment size cannot be bigger than what he asked for */ + if(reqmss != 0 && reqmss < mtu) + mtu = reqmss; + return mtu; } /n/dump/2015/0509/sys/src/nix/ip/tcp.c:1300,1307 - /sys/src/nix/ip/tcp.c:1309,1314 static void tcpsndsyn(Conv *s, Tcpctl *tcb) { - Tcppriv *tpriv; - tcb->iss = (nrand(1<<16)<<16)|nrand(1<<16); tcb->rttseq = tcb->iss; tcb->snd.wl2 = tcb->iss; /n/dump/2015/0509/sys/src/nix/ip/tcp.c:1314,1322 - /sys/src/nix/ip/tcp.c:1321,1327 tcb->sndsyntime = NOW; /* set desired mss and scale */ - tcb->mss = tcpmtu(s->p, s->laddr, s->ipversion, &tcb->scale); - tpriv = s->p->priv; - tpriv->stats[Mss] = tcb->mss; + tcb->mss = tcpmtu(s->p, s->laddr, s->ipversion, 0, &tcb->scale); } void /n/dump/2015/0509/sys/src/nix/ip/tcp.c:1492,1498 - /sys/src/nix/ip/tcp.c:1497,1503 seg.ack = lp->irs+1; seg.flags = SYN|ACK; seg.urg = 0; - seg.mss = tcpmtu(tcp, lp->laddr, lp->version, &scale); + seg.mss = tcpmtu(tcp,
Re: [9fans] fossil+venti performance question
On Fri May 8 20:12:57 PDT 2015, cinap_len...@felloff.net wrote: > do we really need to initialize tcb->mss to tcpmtu() in procsyn()? > as i see it, procsyn() is called only when tcb->state is Syn_sent, > which only should happen for client connections doing a connect, in > which case tcpsndsyn() would have initialized tcb->mss already no? yes, we should. the bug is that we confuse send mss and receive mss. the sender's mss is the one we need to repsect here. tcpsendsyn() should not set the mss, the mss it calculates is for rx. - erik
Re: [9fans] fossil+venti performance question
On May 9, 2015, at 7:43 AM, erik quanstrom wrote: > easy enough until one encounters devices that don't send icmp > responses because it's not implemented, or somehow considered > "secure" that way. Oddly enough, I don't see this 'problem' in the real world. And FreeBSD is far from being alone in the always-set-DF bit. The only place this bites is when you run into tiny shops with homegrown firewalls configured by people who don't understand networking or security. Me, I consider it a feature that these sites self-select themselves off the network. I'm certainly no worse off for not being able to talk to them. signature.asc Description: Message signed with OpenPGP using GPGMail
Re: [9fans] fossil+venti performance question
2015-05-09 10:25 GMT-07:00 Lyndon Nerenberg : > > > On May 9, 2015, at 7:43 AM, erik quanstrom wrote: > > > easy enough until one encounters devices that don't send icmp > > responses because it's not implemented, or somehow considered > > "secure" that way. > > Oddly enough, I don't see this 'problem' in the real world. And FreeBSD is > far from being alone in the always-set-DF bit. > > The only place this bites is when you run into tiny shops with homegrown > firewalls configured by people who don't understand networking or security. > Me, I consider it a feature that these sites self-select themselves off the > network. I'm certainly no worse off for not being able to talk to them. Or when your client is on a cell phone. Cell networks are the worst.
Re: [9fans] fossil+venti performance question
On May 9, 2015, at 10:30 AM, Devon H. O'Dell wrote: > Or when your client is on a cell phone. Cell networks are the worst. Really? Quite often I slave my laptop to my phone's LTE connection, and I never have problems with PMTU. Both here (across western Canada) and in the UK. signature.asc Description: Message signed with OpenPGP using GPGMail
Re: [9fans] fossil+venti performance question
> On May 9, 2015, at 10:25 AM, Lyndon Nerenberg wrote: > > >> On May 9, 2015, at 7:43 AM, erik quanstrom wrote: >> >> easy enough until one encounters devices that don't send icmp >> responses because it's not implemented, or somehow considered >> "secure" that way. > > Oddly enough, I don't see this 'problem' in the real world. And FreeBSD is > far from being alone in the always-set-DF bit. > > The only place this bites is when you run into tiny shops with homegrown > firewalls configured by people who don't understand networking or security. > Me, I consider it a feature that these sites self-select themselves off the > network. I'm certainly no worse off for not being able to talk to them. Network admins not understanding ICMP was far more common 20 years ago. Now the game has changed. At any rate no harm in trying PMTU discovery as an option (other than a SMOP).
Re: [9fans] fossil+venti performance question
2015-05-09 10:35 GMT-07:00 Lyndon Nerenberg : > > On May 9, 2015, at 10:30 AM, Devon H. O'Dell wrote: > >> Or when your client is on a cell phone. Cell networks are the worst. > > Really? Quite often I slave my laptop to my phone's LTE connection, and I > never have problems with PMTU. Both here (across western Canada) and in the > UK. There are lots of hacks all over the Internet to deal with various brokenness on the carrier<->carrier side of things where one end is a cell network. Haven't seen anything come up super recently, but had to help debug some brokenness as recently as a year and a half ago that turned out to be some cell network with really old hardware that didn't do PMTU correctly, causing TLS connections to drop or die. IIRC this particular case was in France, but I also seem to recall the same issue in northern England and perhaps Ireland.
Re: [9fans] fossil+venti performance question
for what it's worth, the original newreno work tcp does not have the mtu bug. on a 8 processor system i have around here i get bwc; while() nettest -a 127.1 tcp!127.0.0.1!40357 count 10; 81920 bytes in 1.505948 s @ 519 MB/s (0ms) tcp!127.0.0.1!47983 count 10; 81920 bytes in 1.377984 s @ 567 MB/s (0ms) tcp!127.0.0.1!53197 count 10; 81920 bytes in 1.299967 s @ 601 MB/s (0ms) tcp!127.0.0.1!61569 count 10; 81920 bytes in 1.418073 s @ 551 MB/s (0ms) however, after fixing things so the initial cwind isn't hosed, i get a little better story: bwc; while() nettest -a 127.1 tcp!127.0.0.1!54261 count 10; 81920 bytes in .5947659 s @ 1.31e+03 MB/s (0ms) boo yah! not bad for trying to clean up some constants. - erik
Re: [9fans] fossil+venti performance question
> however, after fixing things so the initial cwind isn't hosed, i get a little > better story: so, actually, i think this is the root cause. the intial cwind is misset for loopback. i but that the symptom folks will see is that /net/tcp/stats shows fragmentation when performance sucks. evidently there is a backoff bug in sources' tcp, too. i'd love confirmation of this. - erik