Re: [Openstack] URGENT: packet loss on openstack instance

Liping Mao (limao) Sun, 16 Sep 2018 08:38:02 -0700

I think multi queue feature should help.（be careful to make sure the ethtool 
update queue number action also did after reboot the vm）.


Numa cpu pin and queue length will be a plus in my exp. You may need yo have 
performance test in your situatuon，in my case cpu numa helpped the app get very 
stable 720p/1080p transcoding performance. Not sure if your app get benifit.

You are not using L3，this will let you avoid a lot of performance issue. And 
since only two instance with 80kpps packets，so in your case，HW interface should 
not be bottleneck too. And your Nexus 5k/7k will not be bottleneck for sure ;-)


Thanks，
Liping Mao

> 在 2018年9月16日，23:09，Satish Patel <[email protected]> 写道：
> 
> Thanks Liping,
> 
> I am using libvertd 3.9.0 version so look like i am eligible take
> advantage of that feature. phew!
> 
> [root@compute-47 ~]# libvirtd -V
> libvirtd (libvirt) 3.9.0
> 
> Let me tell you how i am running instance on my openstack, my compute
> has 32 core / 32G memory  and i have created two instance on compute
> node 15vcpu and 14G memory ( two instance using 30 vcpu core, i have
> kept 2 core for compute node). on compute node i disabled overcommit
> using ratio (1.0)
> 
> I didn't configure NUMA yet because i wasn't aware of this feature, as
> per your last post do you think numa will help to fix this issue?
> following is my numa view
> 
> [root@compute-47 ~]# numactl --hardware
> available: 2 nodes (0-1)
> node 0 cpus: 0 1 2 3 4 5 6 7 16 17 18 19 20 21 22 23
> node 0 size: 16349 MB
> node 0 free: 133 MB
> node 1 cpus: 8 9 10 11 12 13 14 15 24 25 26 27 28 29 30 31
> node 1 size: 16383 MB
> node 1 free: 317 MB
> node distances:
> node   0   1
>  0:  10  20
>  1:  20  10
> 
> 
> I am not using any L3 router, i am using provide VLAN network and
> using Cisco Nexus switch for my L3 function so i am not seeing any
> bottleneck there.
> 
> This is the 10G NIC i have on all my compute node, dual 10G port with
> bonding (20G)
> 
> 03:00.0 Ethernet controller: Broadcom Limited NetXtreme II BCM57810 10
> Gigabit Ethernet (rev 10)
> 03:00.1 Ethernet controller: Broadcom Limited NetXtreme II BCM57810 10
> Gigabit Ethernet (rev 10)
> 
> 
>> On Sun, Sep 16, 2018 at 10:50 AM Liping Mao (limao) <[email protected]> wrote:
>> 
>> It is still possible to update rx and tx queues length if your qemu and 
>> libvirt version is higher than the version recorded in [3]. （You should 
>> possible to update directly in libvirt configuration if my memory is correct）
>> 
>> We also have some similar use case which run audio/vedio serivcs. They are 
>> CPU consuming and have UDP small packets. Another possible tunning is using 
>> CPU pin for the vm.  you can use numa awared cpu feature to get stable cpu 
>> performance ，vm network dropped packets sometimes because of the vm cpu is 
>> too busy，with numa cpu it works better performance，our way is similar with 
>> [a]. You need to create flavor with special metadata and dedicated Host Agg 
>> for numa awared VMs. Dedicated CPU is very good for media service. It makes 
>> the CPU performance stable.
>> 
>> Another packet loss case we get is because of vm kernel, some of our app are 
>> using 32bit OS, that cause memory issue, when traffic larger then 50kpps, it 
>> dropped a lot,sometimes,it even crash. In this case, 32bit os can actually 
>> use very limited memory, we have to add swap for the vm. Hope your app is 
>> using 64 bit OS. Because 32 bit could cause tons of trouble.
>> 
>> BTW,if you are using vrouter on L3, you’d better to move provider network(no 
>> vrouter). I did not tried DVR, but if you are running without DVR, the L3 
>> node will be bottleneck very quick. Especially default iptables conntrack is 
>> 65535, you will reach to it and drop packet on L3, even after you tun that 
>> value, it still hard to more that 1Mpps for your network node.
>> 
>> If your App more than 200kpps per compute node, you may be better also have 
>> a look your physical network driver tx/rx configuration. Most of the HW 
>> default value for tx/rx queues number and length are very poor,you may start 
>> to get packet on eth interface on physical host when rx queue is full.
>> 
>> [a]https://redhatstackblog.redhat.com/2015/05/05/cpu-pinning-and-numa-topology-awareness-in-openstack-compute/
>> 
>> Regards，
>> Liping Mao
>> 
>> 在 2018年9月16日，21:18，Satish Patel <[email protected]> 写道：
>> 
>> Hi Liping,
>> 
>> Thank you for your reply,
>> 
>> We notice packet drops during high load, I did try txqueue and didn't help 
>> so I believe I am going to try miltiqueue.
>> 
>> For SRIOV I have to look if I have support in my nic.
>> 
>> We are using queens so I think queue size option  not possible :(
>> 
>> We are using voip application and traffic is udp so our pps rate is 60k to 
>> 80k per vm instance.
>> 
>> I will share my result as soon as I try multiqueue.
>> 
>> 
>> 
>> Sent from my iPhone
>> 
>> On Sep 16, 2018, at 2:27 AM, Liping Mao (limao) <[email protected]> wrote:
>> 
>> 
>> Hi Satish,
>> 
>> 
>> 
>> 
>> Did your packet loss happen always or it only happened when heavy load?
>> 
>> 
>> AFAIK, if you do not tun anything, the vm tap can process about 50kpps 
>> before the tap device start to drop packets.
>> 
>> 
>> 
>> 
>> If it happened in heavy load, couple of things you can try:
>> 
>> 
>> 1) increase tap queue length, usually the default value is 500, you can try 
>> larger. (seems like you already tried)
>> 
>> 
>> 2) Try to use virtio multi queues feature , see [1]. Virtio use one queue 
>> for rx/tx in vm, with this feature you can get more queues. You can check
>> 
>> 
>> 3) In rock version, you can use [2] to increase virtio queue size, the 
>> default queues size is 256/512, you may increase it to 1024, this would help 
>> to increase pps of the tap device.
>> 
>> 
>> 
>> 
>> If all these things can not get your network performance requirement, you 
>> may need to move to use dpdk / sriov stuff to get more vm performance.
>> 
>> 
>> I did not actually used them in our env, you may refer to [3]
>> 
>> 
>> 
>> 
>> [1] 
>> https://specs.openstack.org/openstack/nova-specs/specs/liberty/implemented/libvirt-virtiomq.html
>> 
>> 
>> [2] 
>> https://specs.openstack.org/openstack/nova-specs/specs/rocky/implemented/libvirt-virtio-set-queue-sizes.html
>> 
>> 
>> [3] https://docs.openstack.org/ocata/networking-guide/config-sriov.html
>> 
>> 
>> 
>> 
>> Regards,
>> 
>> 
>> Liping Mao
>> 
>> 
>> 
>> 
>> 在 2018/9/16 13:07，“Satish Patel”<[email protected]> 写入:
>> 
>> 
>> 
>> 
>>  [root@compute-33 ~]# ifconfig tap5af7f525-5f | grep -i drop
>> 
>> 
>>          RX errors 0 dropped 0 overruns 0 frame 0
>> 
>> 
>>          TX errors 0 dropped 2528788837 overruns 0 carrier 0 collisions 0
>> 
>> 
>> 
>> 
>>  Noticed tap interface dropping TX packets and even after increasing
>> 
>> 
>>  txqueue from 1000 to 10000 nothing changed, still getting packet
>> 
>> 
>>  drops.
>> 
>> 
>> 
>> 
>>  On Sat, Sep 15, 2018 at 4:22 PM Satish Patel <[email protected]> wrote:
>> 
>> 
>> 
>> 
>> Folks,
>> 
>> 
>> 
>> 
>> I need some advice or suggestion to find out what is going on with my
>> 
>> 
>> network, we have notice high packet loss on openstack instance and not
>> 
>> 
>> sure what is going on, same time if i check on host machine and it has
>> 
>> 
>> zero packet loss.. this is what i did for test...
>> 
>> 
>> 
>> 
>> ping 8.8.8.8
>> 
>> 
>> 
>> 
>> from instance: 50% packet loss
>> 
>> 
>> from compute host: 0% packet loss
>> 
>> 
>> 
>> 
>> I have disabled TSO/GSO/SG setting on physical compute node but still
>> 
>> 
>> getting packet loss.
>> 
>> 
>> 
>> 
>> We have 10G NIC on our network, look like something related to tap
>> 
>> 
>> interface setting..
>> 
>> 
>> 
>> 
>>  _______________________________________________
>> 
>> 
>>  Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>> 
>> 
>>  Post to     : [email protected]
>> 
>> 
>>  Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>> 
>> 
>> 
>> 
>> 
_______________________________________________
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to     : [email protected]
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] URGENT: packet loss on openstack instance

Reply via email to