On 11/12/2017 01:34 PM, Wei Xu wrote: > On Sat, Nov 11, 2017 at 03:59:54PM -0500, Matthew Rosato wrote: >>>> This case should be quite similar with pkgten, if you got improvement with >>>> pktgen, usually it was also the same for UDP, could you please try to >>>> disable >>>> tso, gso, gro, ufo on all host tap devices and guest virtio-net devices? >>>> Currently >>>> the most significant tests would be like this AFAICT: >>>> >>>> Host->VM 4.12 4.13 >>>> TCP: >>>> UDP: >>>> pktgen: >>>> >>>> Don't want to bother you too much, so maybe 4.12 & 4.13 without Jason's >>>> patch should >>>> work since we have seen positive number for that, you can also temporarily >>>> skip >>>> net-next as well. >>> >>> Here are the requested numbers, averaged over numerous runs -- guest is >>> 4GB+1vcpu, host uperf/pktgen bound to 1 host CPU + qemu and vhost thread >>> pinned to other unique host CPUs. tso, gso, gro, ufo disabled on host >>> taps / guest virtio-net devs as requested: >>> >>> Host->VM 4.12 4.13 >>> TCP: 9.92Gb/s 6.44Gb/s >>> UDP: 5.77Gb/s 6.63Gb/s >>> pktgen: 1572403pps 1904265pps >>> >>> UDP/pktgen both show improvement from 4.12->4.13. More interesting, >>> however, is that I am seeing the TCP regression for the first time from >>> host->VM. I wonder if the combination of CPU binding + disabling of one >>> or more of tso/gso/gro/ufo is related. >>> >>>> >>>> If you see UDP and pktgen are aligned, then it might be helpful to continue >>>> the other two cases, otherwise we fail in the first place. >>> >> >> I continued running many iterations of these tests between 4.12 and >> 4.13.. My throughput findings can be summarized as: > > Really nice to have these numbers. >
Wasn't sure if you were asking for the individual #s -- Just in case, here are the other averages I used to draw my conclusions: VM->VM 4.12 4.13 UDP 9.06Gb/s 8.99Gb/s TCP 9.16Gb/s 8.67Gb/s VM->Host 4.12 4.13 UDP 9.70Gb/s 9.53Gb/s TCP 6.12Gb/s 6.00Gb/s >> >> VM->VM case: >> UDP: roughly equivalent >> TCP: Consistent regression (5-10%) >> >> VM->Host >> Both UDP and TCP traffic are roughly equivalent. > > The patch improves performance for Rx from guest point of view, so the Tx > would be no big difference since the Rx packets are far less than Tx in > this case. > >> >> Host->VM >> UDP+pktgen: improvement (5-10%), but inconsistent >> TCP: Consistent regression (25-30%) > > Maybe we can try to figure out this case first since it is the shortest path, > can you have a look at TCP statistics and paste a few outputs between tests? > I am suspecting there are some retransmitting, zero window probing, etc. > Grabbed some netperf -s results after a few minutes of running (snipped uninteresting icmp and udp sections). The test was TCP Host->VM scenario, binding and tso/gso/gro/ufo disabled as before: Host 4.12 Ip: Forwarding: 1 3724964 total packets received 0 forwarded 0 incoming packets discarded 3724964 incoming packets delivered 5000026 requests sent out Tcp: 4 active connection openings 1 passive connection openings 0 failed connection attempts 0 connection resets received 1 connections established 3724954 segments received 133112205 segments sent out 93106 segments retransmitted 0 bad segments received 2 resets sent TcpExt: 5 delayed acks sent 8 packets directly queued to recvmsg prequeue TCPDirectCopyFromPrequeue: 1736 146 packet headers predicted 4 packet headers predicted and directly queued to user 3218205 acknowledgments not containing data payload received 506561 predicted acknowledgments TCPSackRecovery: 2096 TCPLostRetransmit: 860 93106 fast retransmits TCPLossProbes: 5 TCPSackShifted: 1959097 TCPSackMerged: 458343 TCPSackShiftFallback: 7969 TCPRcvCoalesce: 2 TCPOrigDataSent: 133112178 TCPHystartTrainDetect: 2 TCPHystartTrainCwnd: 96 TCPWinProbe: 2 IpExt: InBcastPkts: 4 InOctets: 226014831 OutOctets: 193103919403 InBcastOctets: 1312 InNoECTPkts: 3724964 Host 4.13 Ip: Forwarding: 1 5930785 total packets received 0 forwarded 0 incoming packets discarded 5930785 incoming packets delivered 4495113 requests sent out Tcp: 4 active connection openings 1 passive connection openings 0 failed connection attempts 0 connection resets received 1 connections established 5930775 segments received 73226521 segments sent out 13975 segments retransmitted 0 bad segments received 4 resets sent TcpExt: 5 delayed acks sent 8 packets directly queued to recvmsg prequeue TCPDirectCopyFromPrequeue: 1736 18 packet headers predicted 4 packet headers predicted and directly queued to user 4091720 acknowledgments not containing data payload received 1838984 predicted acknowledgments TCPSackRecovery: 9920 TCPLostRetransmit: 31 13975 fast retransmits TCPLossProbes: 6 TCPSackShifted: 1700187 TCPSackMerged: 1143698 TCPSackShiftFallback: 23839 TCPRcvCoalesce: 2 TCPOrigDataSent: 73226494 TCPHystartTrainDetect: 2 TCPHystartTrainCwnd: 530 IpExt: InBcastPkts: 4 InOctets: 344809215 OutOctets: 106285682663 InBcastOctets: 1312 InNoECTPkts: 5930785 Guest 4.12 Ip: 133112471 total packets received 1 with invalid addresses 0 forwarded 0 incoming packets discarded 133112470 incoming packets delivered 3724897 requests sent out 40 outgoing packets dropped Tcp: 0 active connections openings 6 passive connection openings 0 failed connection attempts 2 connection resets received 2 connections established 133112301 segments received 3724731 segments send out 0 segments retransmited 0 bad segments received. 5 resets sent TcpExt: 1 TCP sockets finished time wait in fast timer 13 delayed acks sent 138408 packets directly queued to recvmsg prequeue. 33119208 bytes directly in process context from backlog 1907783720 bytes directly received in process context from prequeue 127259218 packet headers predicted 1313774 packets header predicted and directly queued to user 24 acknowledgments not containing data payload received 196 predicted acknowledgments 2 connections reset due to early user close TCPRcvCoalesce: 117069950 TCPOFOQueue: 2425393 TCPFromZeroWindowAdv: 109 TCPToZeroWindowAdv: 109 TCPWantZeroWindowAdv: 4487 TCPOrigDataSent: 223 TCPACKSkippedSeq: 1 IpExt: InBcastPkts: 2 InOctets: 199630961414 OutOctets: 226019278 InBcastOctets: 656 InNoECTPkts: 133112471 Guest 4.13 Ip: 73226690 total packets received 1 with invalid addresses 0 forwarded 0 incoming packets discarded 73226689 incoming packets delivered 5930853 requests sent out 40 outgoing packets dropped Tcp: 0 active connections openings 6 passive connection openings 0 failed connection attempts 2 connection resets received 2 connections established 73226522 segments received 5930688 segments send out 0 segments retransmited 0 bad segments received. 2 resets sent TcpExt: 1 TCP sockets finished time wait in fast timer 13 delayed acks sent 490503 packets directly queued to recvmsg prequeue. 306976 bytes directly in process context from backlog 6875924176 bytes directly received in process context from prequeue 65617512 packet headers predicted 4735750 packets header predicted and directly queued to user 20 acknowledgments not containing data payload received 61 predicted acknowledgments 2 connections reset due to early user close TCPRcvCoalesce: 60654609 TCPOFOQueue: 2857814 TCPOrigDataSent: 85 IpExt: InBcastPkts: 1 InOctets: 109839485374 OutOctets: 344816614 InBcastOctets: 328 InNoECTPkts: 73226690 >> >> Host->VM UDP and pktgen seemed to show improvement in some runs, and in >> others seemed to mirror 4.12-level performance. >> >> The TCP regression for VM->VM is no surprise, we started with that. >> It's still consistent, but smaller in this specific environment. > > Right, there are too many facts might influent the performance. > >> >> The TCP regression in Host->VM is interesting because I wasn't seeing it >> consistently before binding CPUs + disabling tso/gso/gro/ufo. Also >> interesting because of how large it is -- By any chance can you see this >> regression on x86 with the same configuration? > > Had a quick test and it seems I also got drop on x86 without tso,gro,..., data > with/without tso,gso,..., will check out tcp statistics and let you know soon. > > 4.12 > -------------------------------------------------------------------------- > master 32.34s 112.63GB 29.91Gb/s 4031090 0.00 > master 32.33s 32.58GB 8.66Gb/s 1166014 0.00 > ------------------------------------------------------------------------- > > 4.13 > ------------------------------------------------------------------------- > master 32.35s 119.17GB 31.64Gb/s 4265190 0.00 > master 32.33s 27.02GB 7.18Gb/s 967007 0.00 > ------------------------------------------------------------------------- > > Wei >