On Wed, Oct 23, 2019 at 11:30 AM Cong Wang <xiyou.wangc...@gmail.com> wrote: > > On Wed, Oct 23, 2019 at 11:14 AM Eric Dumazet <eduma...@google.com> wrote: > > > In case you misunderstand, the CPU profiling I used is captured > > > during 256 parallel TCP_STREAM. > > > > When I asked you the workload, you gave me TCP_RR output, not TCP_STREAM :/ > > > > "A single netperf TCP_RR could _also_ confirm the improvement:" > > I guess you didn't understand what "also" mean? The improvement > can be measured with both TCP_STREAM and TCP_RR, only the > CPU profiling is done with TCP_STREAM. >
Except that I could not measure any gain with TCP_RR, which is expected, since TCP_RR will not use RTO and TLP at the same time. If you found that we were setting both RTO and TLP when sending 1-byte message, we need to fix the stack, instead of working around it. > BTW, I just tested an unpatched kernel on a machine with 64 CPU's, > turning on/off TLP makes no difference there, so this is likely related > to the number of CPU's or hardware configurations. This explains > why you can't reproduce it on your side, so far I can only reproduce > it on one particular hardware platform too, but it is real. > I have hosts with 112 cpus, I can try on them, but it will take some time.