Re: [Openstack] URGENT: packet loss on openstack instance

Satish Patel Tue, 18 Sep 2018 11:50:02 -0700

Liping,

Last 2 days i am running test with hping3 and found following
behavior, if you noticed my result UDP doing very bad if i increase
number of queue, do you know why ?


UDP:

If i set   "ethtool -L eth0 combined 1"  then UDP pps rate is 100kpps
if i set  "ethtool -L eth0 combined 8" then UDP pps rate is 40kpps

TCP:
If i set   "ethtool -L eth0 combined 1"  then UDP pps rate is ~150kpps
If i set   "ethtool -L eth0 combined 1"  then UDP pps rate is ~150kpps
On Mon, Sep 17, 2018 at 8:33 AM Satish Patel <satish....@gmail.com> wrote:
>
> Thanks Liping,
>
> I will try to reach out or open new thread to get sriov info.
>
> By the way what version of openstack you guys using and what hardware 
> specially NIC. Just trying to see if it's hardware related.
>
> I'm running kernel 3.10.x do you think it's not something related kernel.
>
> Sent from my iPhone
>
> On Sep 17, 2018, at 1:27 AM, Liping Mao (limao) <li...@cisco.com> wrote:
>
> >> Question: I have br-vlan interface mapp with bond0 to run my VM (VLAN
> >
> > traffic), so do i need to do anything in bond0 to enable VF/PF
> >
> > function? Just confused because currently my VM nic map with compute
> >
> > node br-vlan bridge.
> >
> >
> >
> > I had not actually used SRIOV in my env~ maybe others could help.
> >
> >
> >
> > Thanks,
> >
> > Liping Mao
> >
> >
> >
> > 在 2018/9/17 11:48，“Satish Patel”<satish....@gmail.com> 写入:
> >
> >
> >
> >    Thanks Liping,
> >
> >
> >
> >    I will check bug for tx/rx queue size and see if i can make it work
> >
> >    but look like my 10G NIC support SR-IOV so i am trying that path
> >
> >    because it will be better for long run.
> >
> >
> >
> >    I have deploy my cloud using openstack-ansible so now i need to figure
> >
> >    out how do i wire that up with openstack-ansible deployment, here is
> >
> >    the article [1]
> >
> >
> >
> >    Question: I have br-vlan interface mapp with bond0 to run my VM (VLAN
> >
> >    traffic), so do i need to do anything in bond0 to enable VF/PF
> >
> >    function? Just confused because currently my VM nic map with compute
> >
> >    node br-vlan bridge.
> >
> >
> >
> >    [root@compute-65 ~]# lspci -nn | grep -i ethernet
> >
> >    03:00.0 Ethernet controller [0200]: Broadcom Limited NetXtreme II
> >
> >    BCM57810 10 Gigabit Ethernet [14e4:168e] (rev 10)
> >
> >    03:00.1 Ethernet controller [0200]: Broadcom Limited NetXtreme II
> >
> >    BCM57810 10 Gigabit Ethernet [14e4:168e] (rev 10)
> >
> >    03:01.0 Ethernet controller [0200]: Broadcom Limited NetXtreme II
> >
> >    BCM57810 10 Gigabit Ethernet Virtual Function [14e4:16af]
> >
> >    03:01.1 Ethernet controller [0200]: Broadcom Limited NetXtreme II
> >
> >    BCM57810 10 Gigabit Ethernet Virtual Function [14e4:16af]
> >
> >    03:01.2 Ethernet controller [0200]: Broadcom Limited NetXtreme II
> >
> >    BCM57810 10 Gigabit Ethernet Virtual Function [14e4:16af]
> >
> >    03:01.3 Ethernet controller [0200]: Broadcom Limited NetXtreme II
> >
> >    BCM57810 10 Gigabit Ethernet Virtual Function [14e4:16af]
> >
> >    03:01.4 Ethernet controller [0200]: Broadcom Limited NetXtreme II
> >
> >    BCM57810 10 Gigabit Ethernet Virtual Function [14e4:16af]
> >
> >    03:01.5 Ethernet controller [0200]: Broadcom Limited NetXtreme II
> >
> >    BCM57810 10 Gigabit Ethernet Virtual Function [14e4:16af]
> >
> >    03:01.6 Ethernet controller [0200]: Broadcom Limited NetXtreme II
> >
> >    BCM57810 10 Gigabit Ethernet Virtual Function [14e4:16af]
> >
> >
> >
> >
> >
> >    [1] 
> > https://docs.openstack.org/openstack-ansible-os_neutron/latest/configure-network-services.html
> >
> >>    On Sun, Sep 16, 2018 at 7:06 PM Liping Mao (limao) <li...@cisco.com> 
> >> wrote:
> >>
> >>
> >
> >> Hi Satish,
> >
> >>
> >
> >>
> >
> >>
> >
> >>
> >
> >>
> >
> >> There are hard limitations in nova's code, I did not actually used more 
> >> thant 8 queues:
> >
> >>
> >
> >>    def _get_max_tap_queues(self):
> >
> >>
> >
> >>        # NOTE(kengo.sakai): In kernels prior to 3.0,
> >
> >>
> >
> >>        # multiple queues on a tap interface is not supported.
> >
> >>
> >
> >>        # In kernels 3.x, the number of queues on a tap interface
> >
> >>
> >
> >>        # is limited to 8. From 4.0, the number is 256.
> >
> >>
> >
> >>        # See: https://bugs.launchpad.net/nova/+bug/1570631
> >
> >>
> >
> >>        kernel_version = int(os.uname()[2].split(".")[0])
> >
> >>
> >
> >>        if kernel_version <= 2:
> >
> >>
> >
> >>            return 1
> >
> >>
> >
> >>        elif kernel_version == 3:
> >
> >>
> >
> >>            return 8
> >
> >>
> >
> >>        elif kernel_version == 4:
> >
> >>
> >
> >>            return 256
> >
> >>
> >
> >>        else:
> >
> >>
> >
> >>            return None
> >
> >>
> >
> >>
> >
> >>
> >
> >>> I am currently playing with those setting and trying to generate
> >
> >>
> >
> >> traffic with hping3 tools, do you have any tool to test traffic
> >
> >>
> >
> >> performance for specially udp style small packets.
> >
> >>
> >
> >>
> >
> >>
> >
> >> Hping3 is good enough to reproduce it, we have app level test tool, but 
> >> that is not your case.
> >
> >>
> >
> >>
> >
> >>
> >
> >>
> >
> >>
> >
> >>>    Here i am trying to increase rx_queue_size & tx_queue_size but its not
> >
> >>
> >
> >>    working somehow. I have tired following.
> >
> >>
> >
> >>
> >
> >>
> >
> >> Since you are not rocky code, it should only works in qemu.conf, maybe 
> >> check if this bug[1] affect you.
> >
> >>
> >
> >>
> >
> >>
> >
> >>
> >
> >>
> >
> >>> Is there a way i can automate this last task to update queue number
> >
> >>
> >
> >> action after reboot vm :) otherwise i can use cloud-init to make sure
> >
> >>
> >
> >> all VM build with same config.
> >
> >>
> >
> >>
> >
> >>
> >
> >> Cloud-init or rc.local could be the place to do that.
> >
> >>
> >
> >>
> >
> >>
> >
> >> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1541960
> >
> >>
> >
> >>
> >
> >>
> >
> >> Regards,
> >
> >>
> >
> >> Liping Mao
> >
> >>
> >
> >>
> >
> >>
> >
> >> 在 2018/9/17 04:09，“Satish Patel”<satish....@gmail.com> 写入:
> >
> >>
> >
> >>
> >
> >>
> >
> >>    Update on my last email.
> >
> >>
> >
> >>
> >
> >>
> >
> >>    I am able to achieve 150kpps with queue=8 and my goal is to do 300kpps
> >
> >>
> >
> >>    because some of voice application using 300kps.
> >
> >>
> >
> >>
> >
> >>
> >
> >>    Here i am trying to increase rx_queue_size & tx_queue_size but its not
> >
> >>
> >
> >>    working somehow. I have tired following.
> >
> >>
> >
> >>
> >
> >>
> >
> >>    1. add rx/tx size in /etc/nova/nova.conf  in libvirt section  - (didn't 
> >> work)
> >
> >>
> >
> >>    2. add /etc/libvirtd/qemu.conf - (didn't work)
> >
> >>
> >
> >>
> >
> >>
> >
> >>    I have try to edit virsh edit <XML> file but somehow my changes not
> >
> >>
> >
> >>    getting reflected, i did virsh define <XML> after change and hard
> >
> >>
> >
> >>    reboot guest but no luck.. how do i edit that option in xml if i want
> >
> >>
> >
> >>    to do that?
> >
> >>
> >
> >>>    On Sun, Sep 16, 2018 at 1:41 PM Satish Patel <satish....@gmail.com> 
> >>> wrote:
> >>
> >>
> >
> >>>
> >
> >>
> >
> >>> I successful reproduce this error with hping3 tool and look like
> >
> >>
> >
> >>> multiqueue is our solution :) but i have few question you may have
> >
> >>
> >
> >>> answer of that.
> >
> >>
> >
> >>>
> >
> >>
> >
> >>> 1. I have created two instance  (vm1.example.com & vm2.example.com)
> >
> >>
> >
> >>>
> >
> >>
> >
> >>> 2. I have flood traffic from vm1 using "hping3 vm2.example.com
> >
> >>
> >
> >>> --flood"  and i have noticed drops on tap interface. ( This is without
> >
> >>
> >
> >>> multiqueue)
> >
> >>
> >
> >>>
> >
> >>
> >
> >>> 3. Enable multiqueue in image and run same test and again got packet
> >
> >>
> >
> >>> drops on tap interface ( I didn't update queue on vm2 guest, so
> >
> >>
> >
> >>> definitely i was expecting packet drops)
> >
> >>
> >
> >>>
> >
> >>
> >
> >>> 4. Now i have try to update vm2 queue using ethtool and i got
> >
> >>
> >
> >>> following error, I have 15vCPU and i was trying to add 15 queue
> >
> >>
> >
> >>>
> >
> >>
> >
> >>> [root@bar-mq ~]# ethtool -L eth0 combined 15
> >
> >>
> >
> >>> Cannot set device channel parameters: Invalid argument
> >
> >>
> >
> >>>
> >
> >>
> >
> >>> Then i have tried 8 queue which works.
> >
> >>
> >
> >>>
> >
> >>
> >
> >>> [root@bar-mq ~]# ethtool -L eth0 combined 8
> >
> >>
> >
> >>> combined unmodified, ignoring
> >
> >>
> >
> >>> no channel parameters changed, aborting
> >
> >>
> >
> >>> current values: tx 0 rx 0 other 0 combined 8
> >
> >>
> >
> >>>
> >
> >>
> >
> >>> Now i am not seeing any packet drops on tap interface, I have measure
> >
> >>
> >
> >>> PPS and i was able to get 160kpps without packet drops.
> >
> >>
> >
> >>>
> >
> >>
> >
> >>> Question:
> >
> >>
> >
> >>>
> >
> >>
> >
> >>> 1. why i am not able to add 15 queue?  ( is this NIC or driver 
> >>> limitation?)
> >
> >>
> >
> >>> 2. how do i automate "ethtool -L eth0 combined 8" command in instance
> >
> >>
> >
> >>> so i don't need to tell my customer to do this manually?
> >
> >>
> >
> >>>> On Sun, Sep 16, 2018 at 11:53 AM Satish Patel <satish....@gmail.com> 
> >>>> wrote:
> >>
> >>
> >
> >>>>
> >
> >>
> >
> >>>> Hi Liping,
> >
> >>
> >
> >>>>
> >
> >>
> >
> >>>>>> I think multi queue feature should help.（be careful to make sure the 
> >>>>>> ethtool update queue number action also did after reboot the vm）.
> >
> >>
> >
> >>>>
> >
> >>
> >
> >>>> Is there a way i can automate this last task to update queue number
> >
> >>
> >
> >>>> action after reboot vm :) otherwise i can use cloud-init to make sure
> >
> >>
> >
> >>>> all VM build with same config.
> >
> >>
> >
> >>>>> On Sun, Sep 16, 2018 at 11:51 AM Satish Patel <satish....@gmail.com> 
> >>>>> wrote:
> >>
> >>
> >
> >>>>>
> >
> >>
> >
> >>>>> I am currently playing with those setting and trying to generate
> >
> >>
> >
> >>>>> traffic with hping3 tools, do you have any tool to test traffic
> >
> >>
> >
> >>>>> performance for specially udp style small packets.
> >
> >>
> >
> >>>>>
> >
> >>
> >
> >>>>> I am going to share all my result and see what do you feel because i
> >
> >>
> >
> >>>>> have noticed you went through this pain :)  I will try every single
> >
> >>
> >
> >>>>> option which you suggested to make sure we are good before i move
> >
> >>
> >
> >>>>> forward to production.
> >
> >>
> >
> >>>>>> On Sun, Sep 16, 2018 at 11:25 AM Liping Mao (limao) <li...@cisco.com> 
> >>>>>> wrote:
> >>
> >>
> >
> >>>>>>
> >
> >>
> >
> >>>>>> I think multi queue feature should help.（be careful to make sure the 
> >>>>>> ethtool update queue number action also did after reboot the vm）.
> >
> >>
> >
> >>>>>>
> >
> >>
> >
> >>>>>> Numa cpu pin and queue length will be a plus in my exp. You may need 
> >>>>>> yo have performance test in your situatuon，in my case cpu numa helpped 
> >>>>>> the app get very stable 720p/1080p transcoding performance. Not sure 
> >>>>>> if your app get benifit.
> >
> >>
> >
> >>>>>>
> >
> >>
> >
> >>>>>> You are not using L3，this will let you avoid a lot of performance 
> >>>>>> issue. And since only two instance with 80kpps packets，so in your 
> >>>>>> case，HW interface should not be bottleneck too. And your Nexus 5k/7k 
> >>>>>> will not be bottleneck for sure ;-)
> >
> >>
> >
> >>>>>>
> >
> >>
> >
> >>>>>>
> >
> >>
> >
> >>>>>> Thanks，
> >
> >>
> >
> >>>>>> Liping Mao
> >
> >>
> >
> >>>>>>
> >
> >>
> >
> >>>>>>>> 在 2018年9月16日，23:09，Satish Patel <satish....@gmail.com> 写道：
> >>
> >>
> >
> >>>>>>>
> >
> >>
> >
> >>>>>>> Thanks Liping,
> >
> >>
> >
> >>>>>>>
> >
> >>
> >
> >>>>>>> I am using libvertd 3.9.0 version so look like i am eligible take
> >
> >>
> >
> >>>>>>> advantage of that feature. phew!
> >
> >>
> >
> >>>>>>>
> >
> >>
> >
> >>>>>>> [root@compute-47 ~]# libvirtd -V
> >
> >>
> >
> >>>>>>> libvirtd (libvirt) 3.9.0
> >
> >>
> >
> >>>>>>>
> >
> >>
> >
> >>>>>>> Let me tell you how i am running instance on my openstack, my compute
> >
> >>
> >
> >>>>>>> has 32 core / 32G memory  and i have created two instance on compute
> >
> >>
> >
> >>>>>>> node 15vcpu and 14G memory ( two instance using 30 vcpu core, i have
> >
> >>
> >
> >>>>>>> kept 2 core for compute node). on compute node i disabled overcommit
> >
> >>
> >
> >>>>>>> using ratio (1.0)
> >
> >>
> >
> >>>>>>>
> >
> >>
> >
> >>>>>>> I didn't configure NUMA yet because i wasn't aware of this feature, as
> >
> >>
> >
> >>>>>>> per your last post do you think numa will help to fix this issue?
> >
> >>
> >
> >>>>>>> following is my numa view
> >
> >>
> >
> >>>>>>>
> >
> >>
> >
> >>>>>>> [root@compute-47 ~]# numactl --hardware
> >
> >>
> >
> >>>>>>> available: 2 nodes (0-1)
> >
> >>
> >
> >>>>>>> node 0 cpus: 0 1 2 3 4 5 6 7 16 17 18 19 20 21 22 23
> >
> >>
> >
> >>>>>>> node 0 size: 16349 MB
> >
> >>
> >
> >>>>>>> node 0 free: 133 MB
> >
> >>
> >
> >>>>>>> node 1 cpus: 8 9 10 11 12 13 14 15 24 25 26 27 28 29 30 31
> >
> >>
> >
> >>>>>>> node 1 size: 16383 MB
> >
> >>
> >
> >>>>>>> node 1 free: 317 MB
> >
> >>
> >
> >>>>>>> node distances:
> >
> >>
> >
> >>>>>>> node   0   1
> >
> >>
> >
> >>>>>>> 0:  10  20
> >
> >>
> >
> >>>>>>> 1:  20  10
> >
> >>
> >
> >>>>>>>
> >
> >>
> >
> >>>>>>>
> >
> >>
> >
> >>>>>>> I am not using any L3 router, i am using provide VLAN network and
> >
> >>
> >
> >>>>>>> using Cisco Nexus switch for my L3 function so i am not seeing any
> >
> >>
> >
> >>>>>>> bottleneck there.
> >
> >>
> >
> >>>>>>>
> >
> >>
> >
> >>>>>>> This is the 10G NIC i have on all my compute node, dual 10G port with
> >
> >>
> >
> >>>>>>> bonding (20G)
> >
> >>
> >
> >>>>>>>
> >
> >>
> >
> >>>>>>> 03:00.0 Ethernet controller: Broadcom Limited NetXtreme II BCM57810 10
> >
> >>
> >
> >>>>>>> Gigabit Ethernet (rev 10)
> >
> >>
> >
> >>>>>>> 03:00.1 Ethernet controller: Broadcom Limited NetXtreme II BCM57810 10
> >
> >>
> >
> >>>>>>> Gigabit Ethernet (rev 10)
> >
> >>
> >
> >>>>>>>
> >
> >>
> >
> >>>>>>>
> >
> >>
> >
> >>>>>>>>> On Sun, Sep 16, 2018 at 10:50 AM Liping Mao (limao) 
> >>>>>>>>> <li...@cisco.com> wrote:
> >>
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>> It is still possible to update rx and tx queues length if your qemu 
> >>>>>>>> and libvirt version is higher than the version recorded in [3]. （You 
> >>>>>>>> should possible to update directly in libvirt configuration if my 
> >>>>>>>> memory is correct）
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>> We also have some similar use case which run audio/vedio serivcs. 
> >>>>>>>> They are CPU consuming and have UDP small packets. Another possible 
> >>>>>>>> tunning is using CPU pin for the vm.  you can use numa awared cpu 
> >>>>>>>> feature to get stable cpu performance ，vm network dropped packets 
> >>>>>>>> sometimes because of the vm cpu is too busy，with numa cpu it works 
> >>>>>>>> better performance，our way is similar with [a]. You need to create 
> >>>>>>>> flavor with special metadata and dedicated Host Agg for numa awared 
> >>>>>>>> VMs. Dedicated CPU is very good for media service. It makes the CPU 
> >>>>>>>> performance stable.
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>> Another packet loss case we get is because of vm kernel, some of our 
> >>>>>>>> app are using 32bit OS, that cause memory issue, when traffic larger 
> >>>>>>>> then 50kpps, it dropped a lot,sometimes,it even crash. In this case, 
> >>>>>>>> 32bit os can actually use very limited memory, we have to add swap 
> >>>>>>>> for the vm. Hope your app is using 64 bit OS. Because 32 bit could 
> >>>>>>>> cause tons of trouble.
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>> BTW,if you are using vrouter on L3, you’d better to move provider 
> >>>>>>>> network(no vrouter). I did not tried DVR, but if you are running 
> >>>>>>>> without DVR, the L3 node will be bottleneck very quick. Especially 
> >>>>>>>> default iptables conntrack is 65535, you will reach to it and drop 
> >>>>>>>> packet on L3, even after you tun that value, it still hard to more 
> >>>>>>>> that 1Mpps for your network node.
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>> If your App more than 200kpps per compute node, you may be better 
> >>>>>>>> also have a look your physical network driver tx/rx configuration. 
> >>>>>>>> Most of the HW default value for tx/rx queues number and length are 
> >>>>>>>> very poor,you may start to get packet on eth interface on physical 
> >>>>>>>> host when rx queue is full.
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>> [a]https://redhatstackblog.redhat.com/2015/05/05/cpu-pinning-and-numa-topology-awareness-in-openstack-compute/
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>> Regards，
> >
> >>
> >
> >>>>>>>> Liping Mao
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>>> 在 2018年9月16日，21:18，Satish Patel <satish....@gmail.com> 写道：
> >>
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>> Hi Liping,
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>> Thank you for your reply,
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>> We notice packet drops during high load, I did try txqueue and 
> >>>>>>>> didn't help so I believe I am going to try miltiqueue.
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>> For SRIOV I have to look if I have support in my nic.
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>> We are using queens so I think queue size option  not possible :(
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>> We are using voip application and traffic is udp so our pps rate is 
> >>>>>>>> 60k to 80k per vm instance.
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>> I will share my result as soon as I try multiqueue.
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>> Sent from my iPhone
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>>> On Sep 16, 2018, at 2:27 AM, Liping Mao (limao) <li...@cisco.com> 
> >>>>>>>>> wrote:
> >>
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>> Hi Satish,
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>> Did your packet loss happen always or it only happened when heavy 
> >>>>>>>> load?
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>> AFAIK, if you do not tun anything, the vm tap can process about 
> >>>>>>>> 50kpps before the tap device start to drop packets.
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>> If it happened in heavy load, couple of things you can try:
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>> 1) increase tap queue length, usually the default value is 500, you 
> >>>>>>>> can try larger. (seems like you already tried)
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>> 2) Try to use virtio multi queues feature , see [1]. Virtio use one 
> >>>>>>>> queue for rx/tx in vm, with this feature you can get more queues. 
> >>>>>>>> You can check
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>> 3) In rock version, you can use [2] to increase virtio queue size, 
> >>>>>>>> the default queues size is 256/512, you may increase it to 1024, 
> >>>>>>>> this would help to increase pps of the tap device.
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>> If all these things can not get your network performance 
> >>>>>>>> requirement, you may need to move to use dpdk / sriov stuff to get 
> >>>>>>>> more vm performance.
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>> I did not actually used them in our env, you may refer to [3]
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>> [1] 
> >>>>>>>> https://specs.openstack.org/openstack/nova-specs/specs/liberty/implemented/libvirt-virtiomq.html
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>> [2] 
> >>>>>>>> https://specs.openstack.org/openstack/nova-specs/specs/rocky/implemented/libvirt-virtio-set-queue-sizes.html
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>> [3] 
> >>>>>>>> https://docs.openstack.org/ocata/networking-guide/config-sriov.html
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>> Regards,
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>> Liping Mao
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>> 在 2018/9/16 13:07，“Satish Patel”<satish....@gmail.com> 写入:
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>> [root@compute-33 ~]# ifconfig tap5af7f525-5f | grep -i drop
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>>         RX errors 0 dropped 0 overruns 0 frame 0
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>>         TX errors 0 dropped 2528788837 overruns 0 carrier 0 
> >>>>>>>> collisions 0
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>> Noticed tap interface dropping TX packets and even after increasing
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>> txqueue from 1000 to 10000 nothing changed, still getting packet
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>> drops.
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>>> On Sat, Sep 15, 2018 at 4:22 PM Satish Patel <satish....@gmail.com> 
> >>>>>>>>> wrote:
> >>
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>> Folks,
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>> I need some advice or suggestion to find out what is going on with my
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>> network, we have notice high packet loss on openstack instance and 
> >>>>>>>> not
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>> sure what is going on, same time if i check on host machine and it 
> >>>>>>>> has
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>> zero packet loss.. this is what i did for test...
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>> ping 8.8.8.8
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>> from instance: 50% packet loss
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>> from compute host: 0% packet loss
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>> I have disabled TSO/GSO/SG setting on physical compute node but still
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>> getting packet loss.
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>> We have 10G NIC on our network, look like something related to tap
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>> interface setting..
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>> _______________________________________________
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>> Mailing list: 
> >>>>>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>> Post to     : openstack@lists.openstack.org
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>> Unsubscribe : 
> >>>>>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>>>>>>>
> >
> >>
> >
> >>
> >
> >>
> >
> >>
> >
> >
> >
> >

_______________________________________________
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to     : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Re: [Openstack] URGENT: packet loss on openstack instance

Reply via email to