I am currently playing with those setting and trying to generate traffic with hping3 tools, do you have any tool to test traffic performance for specially udp style small packets.
I am going to share all my result and see what do you feel because i have noticed you went through this pain :) I will try every single option which you suggested to make sure we are good before i move forward to production. On Sun, Sep 16, 2018 at 11:25 AM Liping Mao (limao) <li...@cisco.com> wrote: > > I think multi queue feature should help.(be careful to make sure the ethtool > update queue number action also did after reboot the vm). > > Numa cpu pin and queue length will be a plus in my exp. You may need yo have > performance test in your situatuon,in my case cpu numa helpped the app get > very stable 720p/1080p transcoding performance. Not sure if your app get > benifit. > > You are not using L3,this will let you avoid a lot of performance issue. And > since only two instance with 80kpps packets,so in your case,HW interface > should not be bottleneck too. And your Nexus 5k/7k will not be bottleneck for > sure ;-) > > > Thanks, > Liping Mao > > > 在 2018年9月16日,23:09,Satish Patel <satish....@gmail.com> 写道: > > > > Thanks Liping, > > > > I am using libvertd 3.9.0 version so look like i am eligible take > > advantage of that feature. phew! > > > > [root@compute-47 ~]# libvirtd -V > > libvirtd (libvirt) 3.9.0 > > > > Let me tell you how i am running instance on my openstack, my compute > > has 32 core / 32G memory and i have created two instance on compute > > node 15vcpu and 14G memory ( two instance using 30 vcpu core, i have > > kept 2 core for compute node). on compute node i disabled overcommit > > using ratio (1.0) > > > > I didn't configure NUMA yet because i wasn't aware of this feature, as > > per your last post do you think numa will help to fix this issue? > > following is my numa view > > > > [root@compute-47 ~]# numactl --hardware > > available: 2 nodes (0-1) > > node 0 cpus: 0 1 2 3 4 5 6 7 16 17 18 19 20 21 22 23 > > node 0 size: 16349 MB > > node 0 free: 133 MB > > node 1 cpus: 8 9 10 11 12 13 14 15 24 25 26 27 28 29 30 31 > > node 1 size: 16383 MB > > node 1 free: 317 MB > > node distances: > > node 0 1 > > 0: 10 20 > > 1: 20 10 > > > > > > I am not using any L3 router, i am using provide VLAN network and > > using Cisco Nexus switch for my L3 function so i am not seeing any > > bottleneck there. > > > > This is the 10G NIC i have on all my compute node, dual 10G port with > > bonding (20G) > > > > 03:00.0 Ethernet controller: Broadcom Limited NetXtreme II BCM57810 10 > > Gigabit Ethernet (rev 10) > > 03:00.1 Ethernet controller: Broadcom Limited NetXtreme II BCM57810 10 > > Gigabit Ethernet (rev 10) > > > > > >> On Sun, Sep 16, 2018 at 10:50 AM Liping Mao (limao) <li...@cisco.com> > >> wrote: > >> > >> It is still possible to update rx and tx queues length if your qemu and > >> libvirt version is higher than the version recorded in [3]. (You should > >> possible to update directly in libvirt configuration if my memory is > >> correct) > >> > >> We also have some similar use case which run audio/vedio serivcs. They are > >> CPU consuming and have UDP small packets. Another possible tunning is > >> using CPU pin for the vm. you can use numa awared cpu feature to get > >> stable cpu performance ,vm network dropped packets sometimes because of > >> the vm cpu is too busy,with numa cpu it works better performance,our way > >> is similar with [a]. You need to create flavor with special metadata and > >> dedicated Host Agg for numa awared VMs. Dedicated CPU is very good for > >> media service. It makes the CPU performance stable. > >> > >> Another packet loss case we get is because of vm kernel, some of our app > >> are using 32bit OS, that cause memory issue, when traffic larger then > >> 50kpps, it dropped a lot,sometimes,it even crash. In this case, 32bit os > >> can actually use very limited memory, we have to add swap for the vm. Hope > >> your app is using 64 bit OS. Because 32 bit could cause tons of trouble. > >> > >> BTW,if you are using vrouter on L3, you’d better to move provider > >> network(no vrouter). I did not tried DVR, but if you are running without > >> DVR, the L3 node will be bottleneck very quick. Especially default > >> iptables conntrack is 65535, you will reach to it and drop packet on L3, > >> even after you tun that value, it still hard to more that 1Mpps for your > >> network node. > >> > >> If your App more than 200kpps per compute node, you may be better also > >> have a look your physical network driver tx/rx configuration. Most of the > >> HW default value for tx/rx queues number and length are very poor,you may > >> start to get packet on eth interface on physical host when rx queue is > >> full. > >> > >> [a]https://redhatstackblog.redhat.com/2015/05/05/cpu-pinning-and-numa-topology-awareness-in-openstack-compute/ > >> > >> Regards, > >> Liping Mao > >> > >> 在 2018年9月16日,21:18,Satish Patel <satish....@gmail.com> 写道: > >> > >> Hi Liping, > >> > >> Thank you for your reply, > >> > >> We notice packet drops during high load, I did try txqueue and didn't help > >> so I believe I am going to try miltiqueue. > >> > >> For SRIOV I have to look if I have support in my nic. > >> > >> We are using queens so I think queue size option not possible :( > >> > >> We are using voip application and traffic is udp so our pps rate is 60k to > >> 80k per vm instance. > >> > >> I will share my result as soon as I try multiqueue. > >> > >> > >> > >> Sent from my iPhone > >> > >> On Sep 16, 2018, at 2:27 AM, Liping Mao (limao) <li...@cisco.com> wrote: > >> > >> > >> Hi Satish, > >> > >> > >> > >> > >> Did your packet loss happen always or it only happened when heavy load? > >> > >> > >> AFAIK, if you do not tun anything, the vm tap can process about 50kpps > >> before the tap device start to drop packets. > >> > >> > >> > >> > >> If it happened in heavy load, couple of things you can try: > >> > >> > >> 1) increase tap queue length, usually the default value is 500, you can > >> try larger. (seems like you already tried) > >> > >> > >> 2) Try to use virtio multi queues feature , see [1]. Virtio use one queue > >> for rx/tx in vm, with this feature you can get more queues. You can check > >> > >> > >> 3) In rock version, you can use [2] to increase virtio queue size, the > >> default queues size is 256/512, you may increase it to 1024, this would > >> help to increase pps of the tap device. > >> > >> > >> > >> > >> If all these things can not get your network performance requirement, you > >> may need to move to use dpdk / sriov stuff to get more vm performance. > >> > >> > >> I did not actually used them in our env, you may refer to [3] > >> > >> > >> > >> > >> [1] > >> https://specs.openstack.org/openstack/nova-specs/specs/liberty/implemented/libvirt-virtiomq.html > >> > >> > >> [2] > >> https://specs.openstack.org/openstack/nova-specs/specs/rocky/implemented/libvirt-virtio-set-queue-sizes.html > >> > >> > >> [3] https://docs.openstack.org/ocata/networking-guide/config-sriov.html > >> > >> > >> > >> > >> Regards, > >> > >> > >> Liping Mao > >> > >> > >> > >> > >> 在 2018/9/16 13:07,“Satish Patel”<satish....@gmail.com> 写入: > >> > >> > >> > >> > >> [root@compute-33 ~]# ifconfig tap5af7f525-5f | grep -i drop > >> > >> > >> RX errors 0 dropped 0 overruns 0 frame 0 > >> > >> > >> TX errors 0 dropped 2528788837 overruns 0 carrier 0 collisions 0 > >> > >> > >> > >> > >> Noticed tap interface dropping TX packets and even after increasing > >> > >> > >> txqueue from 1000 to 10000 nothing changed, still getting packet > >> > >> > >> drops. > >> > >> > >> > >> > >> On Sat, Sep 15, 2018 at 4:22 PM Satish Patel <satish....@gmail.com> wrote: > >> > >> > >> > >> > >> Folks, > >> > >> > >> > >> > >> I need some advice or suggestion to find out what is going on with my > >> > >> > >> network, we have notice high packet loss on openstack instance and not > >> > >> > >> sure what is going on, same time if i check on host machine and it has > >> > >> > >> zero packet loss.. this is what i did for test... > >> > >> > >> > >> > >> ping 8.8.8.8 > >> > >> > >> > >> > >> from instance: 50% packet loss > >> > >> > >> from compute host: 0% packet loss > >> > >> > >> > >> > >> I have disabled TSO/GSO/SG setting on physical compute node but still > >> > >> > >> getting packet loss. > >> > >> > >> > >> > >> We have 10G NIC on our network, look like something related to tap > >> > >> > >> interface setting.. > >> > >> > >> > >> > >> _______________________________________________ > >> > >> > >> Mailing list: > >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack > >> > >> > >> Post to : openstack@lists.openstack.org > >> > >> > >> Unsubscribe : > >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack > >> > >> > >> > >> > >> _______________________________________________ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack