I run ethtool -k for backend tap netdevice, find that its tso is off, Features for tap0: rx-checksumming: off [fixed] tx-checksumming: on tx-checksum-ipv4: off [fixed] tx-checksum-ip-generic: on tx-checksum-ipv6: off [fixed] tx-checksum-fcoe-crc: off [fixed] tx-checksum-sctp: off [fixed] scatter-gather: on tx-scatter-gather: on tx-scatter-gather-fraglist: on tcp-segmentation-offload: off tx-tcp-segmentation: off [requested on] tx-tcp-ecn-segmentation: off [requested on] tx-tcp6-segmentation: off [requested on] udp-fragmentation-offload: off [requested on] generic-segmentation-offload: on generic-receive-offload: on large-receive-offload: off [fixed] rx-vlan-offload: off [fixed] tx-vlan-offload: off [fixed] ntuple-filters: off [fixed] receive-hashing: off [fixed] highdma: off [fixed] rx-vlan-filter: off [fixed] vlan-challenged: off [fixed] tx-lockless: off [fixed] netns-local: off [fixed] tx-gso-robust: off [fixed] tx-fcoe-segmentation: off [fixed] tx-gre-segmentation: off [fixed] tx-udp_tnl-segmentation: off [fixed] fcoe-mtu: off [fixed] tx-nocache-copy: on loopback: off [fixed] rx-fcs: off [fixed] rx-all: off [fixed] tx-vlan-stag-hw-insert: off [fixed] rx-vlan-stag-hw-parse: off [fixed] rx-vlan-stag-filter: off [fixed]
but, I failed to enable its tso, "Could not change any device features" error message was reported, why? >>I see that RX checksumming is still off for you on virtio, this is >>likely what's contribution to the problem. >> >>Here's how it looks for me: >>ethtool -k eth1 >> Offload parameters for eth1: >> rx-checksumming: on >> tx-checksumming: on >> scatter-gather: on >> tcp-segmentation-offload: on >> udp-fragmentation-offload: on >> generic-segmentation-offload: on >> generic-receive-offload: on >> large-receive-offload: off >> >When I select centos-6.3 as guest os, the rx-checksuming is on, too. >After update the qemu from 1.4.0 to 2.0.0, the inter-vm throughput can achieve >~5Gbps via netper -t TCP_STREAM -m 1400. >Here is ethtool -k eth1 on centos-6.3 guest, >ethtool -k eth1 >Offload parameters for eth1: >rx-checksumming: on >tx-checksumming: on >scatter-gather: on >tcp-segmentation-offload: on >udp-fragmentation-offload: on >generic-segmentation-offload: on >generic-receive-offload: off >large-receive-offload: off > >the only difference is gro, on for you, off for me, >I run 'ethtool -K eth1 gro on' on my guest, below error reported, >"Cannot set device GRO settings: Invalid argument" > >>you don't supply kernel versions for host or guest kernels, >>so it's hard to judge what's going on exactly. >> >host: linux-3.10.27(directly download from kernel.org) >qemu: qemu-2.0.0(directly download from wiki.qemu.org/Download) >guest: centos-6.3(2.6.32-279.e16.x86_64), 2vcpu > >>Bridge configuration also plays a huge role. >>Things like ebtables might affect performance as well, >>sometimes even if they are only loaded, not even enabled. >> >I will check it. > >>Also, some old scheduler versions didn't put VMs on different >>CPUs aggressively enough, this resulted in conflicts >>when VMs compete for the same CPU. >I will check it. > No aggressively contention for the same CPU, but when I pin each vcpu to different pcpu, ~1Gbps bonus was gained. >>On numa systems, some older host kernels would split VM memory >>across NUMA nodes, this might lead to bad performance. >> >local first. > >>On Sat, Jun 07, 2014 at 11:07:10AM +0800, Zhang Haoyu wrote: >>> After updating the qemu from 1.4 to 2.0, the inter-vm throughput can >>> achieve ~5Gbps via netper -t TCP_STREAM -m 1400, >>> the performance gap(~2Gbps) between kvm and xen still exist. >>> >>> Thanks, >>> Zhang Haoyu >>> >>> ------------------ >>> Zhang Haoyu >>> 2014-06-07 >>> >>> -----Original Message----- >>> From: Zhang Haoyu >>> Sent: 2014-06-07 09:27:16 >>> To: Venkateswara Rao Nandigam; kvm; qemu-devel >>> Cc: Gleb Natapov; Paolo Bonzini; Michael S.Tsirkin; yewudi >>> Subject: Re: [network performance question] only ~2Gbpsthroughputbetweentwo >>> linux guests which are running on the same host vianetperf-tTCP_STREAM -m >>> 1400, but xen can ac >>> >>> > Doesn't that answer your original question about performance gap! > > Sorry, do you mean it's the offloadings cause the performance gap? >>> But even OFF the checksum-offload, tso, gro, .etc, the performance gap >>> still exist, >>> if I understand correctly, kvm should have better performance than xen from >>> the angle of implementation, because of shorter path, and fewer >>> context-switches, >>> especially inter-vm communication. >>> >>> And, why the performance gap is so big(~2G vs ~7G) when checksum-offload, >>> tso, gro, .etc is on for both hypervisors? >>> Why the packes' size can be so big(65160) and stable on xen, but most >>> packets' size is 1448, only a few part is ~65000 on kvm, when netperf -t >>> TCP_STREAM -m 1400 ? >>> Does some TCP configurations have buissness with this? Or some virtio-net >>> configurations? >>> >>> Thanks, >>> Zhang Haoyu >>> >>> -----Original Message----- >>> From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On >>> Behalf Of Zhang Haoyu >>> Sent: Friday, June 06, 2014 3:44 PM >>> To: Venkateswara Rao Nandigam; kvm; qemu-devel >>> Cc: Gleb Natapov; Paolo Bonzini; Michael S.Tsirkin; yewudi >>> Subject: Re: RE: [network performance question] only ~2Gbps >>> throughputbetweentwo linux guests which are running on the same host via >>> netperf-tTCP_STREAM -m 1400, but xen can ac >>> >>> > >> Try Rx/Tx checksum offload on the all the concerned guests of both >>> > >> Hypervisors. >>> > >> >>> > >Already ON on both hypervisors, so some other offloadings(e.g. tso, gso) >>> > >can be supported. >>> > >>> > Try Rx/Tx checksum offload "OFF" on the all the concerned guests of >>> > both Hypervisors >>> > >>> Off Rx/Tx checksum offload on XEN guest, 1.6Gbps achived, tcpdump result >>> on backend vif netdeivce shown that packets' size is 1448, stable. >>> Off Rx/Tx checksum offload on KVM guest, only ~1Gbps ahchived, tcpdump >>> result on backend tap netdevice shown that packets' size is 1448, stable. >>> >>> > And While launching the VM in KVM, in command line of virtio interface, >>> > you can specify TSO, LRO, RxMergebuf. Try this instead of ethtool >>> > interface. >>> The cuurent qemu command shown as below, and I will change the virtio-net >>> configuration later as your advise, /usr/bin/kvm -id 8572667846472 -chardev >>> socket,id=qmp,path=/var/run/qemu-server/8572667846472.qmp,server,nowait >>> -mon chardev=qmp,mode=control -vnc :0,websocket,to=200,x509,password >>> -pidfile /var/run/qemu-server/8572667846472.pid -daemonize -name >>> centos6-196.5.5.72 -smp sockets=1,cores=2 -cpu core2duo -nodefaults -vga >>> cirrus -no-hpet -k en-us -boot menu=on,splash-time=8000 -m 4096 -usb -drive >>> file=/sf/data/local/iso/vmtools/virtio_auto_install.iso,if=none,id=drive-ide0,media=cdrom,aio=threads,forecast=disable >>> -device ide-cd,bus=ide.0,unit=0,drive=drive-ide0,id=ide0,bootindex=200 >>> -drive >>> file=/sf/data/local/images/host-f8bc123b3e74/32f49b646d1e/centos6-196.5.5.72.vm/vm-disk-1.qcow2,if=none,id=drive-ide2,cache=directsync,aio=threads,forecast=disable >>> -device ide-hd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=100 >>> -netdev type=tap,id=net0,ifname=857266784647200,s c > r >> ip >>> t=/sf/etc/kvm/vtp-bridge,vhost=on,vhostforce=on -device >>> virtio-net-pci,mac=FE:FC:FE:95:EC:A7,netdev=net0,bus=p >>> ci.0,addr=0x12,id=net0,bootindex=300 -rtc driftfix=slew,clock=rt -global >>> kvm-pit.lost_tick_policy=discard -global PIIX4_PM.disable_s3=1 -global >>> PIIX4_PM.disable_s4=1 >>> >>> -----Original Message----- >>> From: Zhang Haoyu [mailto:zhan...@sangfor.com] >>> Sent: Friday, June 06, 2014 1:26 PM >>> To: Venkateswara Rao Nandigam; kvm; qemu-devel >>> Cc: Gleb Natapov; Paolo Bonzini; Michael S.Tsirkin; yewudi >>> Subject: RE: [network performance question] only ~2Gbps throughput >>> betweentwo linux guests which are running on the same host via netperf >>> -tTCP_STREAM -m 1400, but xen can ac >>> >>> Thanks for reply. >>> > >>> And, vhost enabled, tx zero-copy enabled, virtio TSO enabled on kvm. >>> > >>> > Try lro "ON" on client side. This would require mergable Rx buffers to be >>> > ON. >>> > >>> current setttings for gro and lro, >>> generic-receive-offload: on >>> large-receive-offload: off [fixed] >>> >>> > And Xen netfront to KVM virtio are not apples to apples because of their >>> > implementation details. >>> > >>> You are right, I just want to make network performance comparison between >>> the two virtualization platform from the view of user. >>> >>> > Try Rx/Tx checksum offload on the all the concerned guests of both >>> > Hypervisors. >>> > >>> Already ON on both hypervisors, so some other offloadings(e.g. tso, gso) >>> can be supported. >>> >>> kvm virtio-net nic: >>> ethtool -k eth0 >>> Features for eth0: >>> rx-checksumming: off [fixed] >>> tx-checksumming: on >>> tx-checksum-ipv4: off [fixed]\ >>> tx-checksum-ip-generic: on >>> tx-checksum-ipv6: off [fixed] >>> tx-checksum-fcoe-crc: off [fixed] >>> tx-checksum-sctp: off [fixed] >>> scatter-gather: on >>> tx-scatter-gather: on >>> scatter-gather-fraglist: on >>> tcp-segmentation-offload: on >>> tx-tcp-segmentation: on >>> tx-tcp-ecn-segmentation: on >>> tx-tcp6-segmentation: on >>> udp-fragmentation-offload: on >>> generic-segmentation-offload: on >>> generic-receive-offload: on >>> large-receive-offload: off [fixed] >>> rx-vlan-offload: off [fixed] >>> tx-vlan-offload: off [fixed] >>> ntuple-filters: off [fixed] >>> receive-hashing: off [fixed] >>> highdma: on [fixed] >>> rx-vlan-filter: on [fixed] >>> vlan-challenged: off [fixed] >>> tx-lockless: off [fixed] >>> netns-local: off [fixed] >>> tx-gso-robust: off [fixed] >>> tx-fcoe-segmentation: off [fixed] >>> tx-gre-segmentation: off [fixed] >>> tx-udp_tnl-segmentation: off [fixed] >>> fcoe-mtu: off [fixed] >>> tx-nocache-copy: on >>> loopback: off [fixed] >>> rx-fcs: off [fixed] >>> rx-all: off [fixed] >>> tx-vlan-stag-hw-insert: off [fixed] >>> rx-vlan-stag-hw-parse: off [fixed] >>> rx-vlan-stag-filter: off [fixed] >>> >>> xen netfront nic: >>> ethtool -k eth0 >>> Offload features for eth0: >>> rx-checksumming: on >>> tx-checksumming: on >>> scatter-gather: on >>> tcp-segmentation-offload: on >>> udp-fragmentation-offload: off >>> generic-segmentation-offload: on >>> generic-receive-offload: off >>> large-receive-offload: off >>> >>> <piece of tcpdump result on xen backend vif netdevice > >>> 15:46:41.279954 IP 196.6.6.72.53507 > 196.6.6.71.53622: Flags [.], seq >>> 1193138968:1193204128, ack 1, win 115, options [nop,nop,TS val 102307210 >>> ecr 102291188], length 65160 >>> 15:46:41.279971 IP 196.6.6.72.53507 > 196.6.6.71.53622: Flags [.], seq >>> 1193204128:1193269288, ack 1, win 115, options [nop,nop,TS val 102307210 >>> ecr 102291188], length 65160 >>> 15:46:41.279987 IP 196.6.6.72.53507 > 196.6.6.71.53622: Flags [.], seq >>> 1193269288:1193334448, ack 1, win 115, options [nop,nop,TS val 102307210 >>> ecr 102291188], length 65160 >>> 15:46:41.280003 IP 196.6.6.72.53507 > 196.6.6.71.53622: Flags [.], seq >>> 1193334448:1193399608, ack 1, win 115, options [nop,nop,TS val 102307210 >>> ecr 102291188], length 65160 >>> 15:46:41.280020 IP 196.6.6.72.53507 > 196.6.6.71.53622: Flags [.], seq >>> 1193399608:1193464768, ack 1, win 115, options [nop,nop,TS val 102307210 >>> ecr 102291188], length 65160 >>> 15:46:41.280213 IP 196.6.6.72.53507 > 196.6.6.71.53622: Flags [.], seq >>> 1193464768:1193529928, ack 1, win 115, options [nop,nop,TS val 102307211 >>> ecr 102291189], length 65160 >>> 15:46:41.280233 IP 196.6.6.72.53507 > 196.6.6.71.53622: Flags [.], seq >>> 1193529928:1193595088, ack 1, win 115, options [nop,nop,TS val 102307211 >>> ecr 102291189], length 65160 >>> 15:46:41.280250 IP 196.6.6.72.53507 > 196.6.6.71.53622: Flags [.], seq >>> 1193595088:1193660248, ack 1, win 115, options [nop,nop,TS val 102307211 >>> ecr 102291189], length 65160 >>> 15:46:41.280239 IP 196.6.6.71.53622 > 196.6.6.72.53507: Flags [.], ack >>> 1193138968, win 22399, options [nop,nop,TS val 102291190 ecr 102307210], >>> length 0 >>> 15:46:41.280267 IP 196.6.6.72.53507 > 196.6.6.71.53622: Flags [.], seq >>> 1193660248:1193725408, ack 1, win 115, options [nop,nop,TS val 102307211 >>> ecr 102291189], length 65160 >>> 15:46:41.280284 IP 196.6.6.72.53507 > 196.6.6.71.53622: Flags [.], seq >>> 1193725408:1193790568, ack 1, win 115, options [nop,nop,TS val 102307211 >>> ecr 102291189], length 65160 >>> >>> Packets' size is very stable, 65160 Bytes. >>> >>> Thanks, >>> Zhang Haoyu >>> >>> -----Original Message----- >>> From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On >>> Behalf Of Zhang Haoyu >>> Sent: Friday, June 06, 2014 9:01 AM >>> To: kvm; qemu-devel >>> Cc: Gleb Natapov; Paolo Bonzini; Michael S.Tsirkin; yewudi >>> Subject: [network performance question] only ~2Gbps throughput between two >>> linux guests which are running on the same host via netperf -t TCP_STREAM >>> -m 1400, but xen can achieve ~7Gbps >>> >>> Hi, all >>> >>> I ran two linux guest on the same kvm host, then start the netserver on one >>> vm, start netperf on the other one, netperf command and test result shown >>> as below, netperf -H 196.5.5.71 -t TCP_STREAM -l 60 -- -m 1400 -M 1400 >>> MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to >>> 196.5.5.71 () port 0 AF_INET : nodelay >>> Recv Send Send >>> Socket Socket Message Elapsed >>> Size Size Size Time Throughput >>> bytes bytes bytes secs. 10^6bits/sec >>> >>> 87380 16384 1400 60.01 2355.45 >>> >>> but I ran two linux guest on the same xen hypervisor, ~7Gbps throughput >>> achived, netperf command and test result shown as below, netperf -H >>> 196.5.5.71 -t TCP_STREAM -l 60 -- -m 1400 -M 1400 MIGRATED TCP STREAM TEST >>> from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 196.5.5.71 () port 0 AF_INET >>> Recv Send Send >>> Socket Socket Message Elapsed >>> Size Size Size Time Throughput >>> bytes bytes bytes secs. 10^6bits/sec >>> >>> 87380 16384 1400 60.01 2349.82 >>> >>> many times test performed, the result is similar as above. >>> >>> When I tcpdump backend tap netdevice, found that most packets' size is >>> 1448bytes on kvm, and few packets are ~60000Bytes, but I tcpdump backend >>> vif netdevice, found that most packets' size is >60000bytes on xen. >>> Test result of netperf -t TCP_STREAM -m 64 is similar, more larger packets >>> on xen than kvm. >>> >>> And, vhost enabled, tx zero-copy enabled, virtio TSO enabled on kvm. >>> >>> Any ideas? >>> >>> Thanks, >>> Zhang Haoyu