Re: intermittent packet loss after upgrading and restarting networks

Nick Burke Sun, 17 Aug 2014 16:39:07 -0700

Another update:


100% confirmed to be traffic shapping set by CloudStack. I don't know
where/how/why, and I'd love some help with this. Should I create a new
thread? As previously mentioned, I don't believe I've set a cap of below
100Mbs ANYWHERE in Cloudstack. Not in compute offerings, network offerings,
and not in the default throttle (which is set at 200).

What am I missing?

I removed tc rules on the host for two test instances and bandwidth shot up.

Before:

ubuntu@testserver01:~$ iperf -s
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
[  4] local 10.1.1.101 port 5001 connected with 10.1.1.102 port 59276
[ ID] Interval       Transfer     Bandwidth
[  4]  0.0-10.4 sec  6.62 MBytes  5.35 Mbits/sec
[  5] local 10.1.1.101 port 5001 connected with 10.1.1.102 port 59277
[  5]  0.0-10.5 sec  6.62 MBytes  5.28 Mbits/sec
[  4] local 10.1.1.101 port 5001 connected with 10.1.1.102 port 59278
[  4]  0.0-10.4 sec  6.62 MBytes  5.37 Mbits/sec
[  5] local 10.1.1.101 port 5001 connected with 10.1.1.102 port 59291
[  5]  0.0-10.3 sec  6.62 MBytes  5.37 Mbits/sec
[  4] local 10.1.1.101 port 5001 connected with 10.1.1.102 port 59306
[  4]  0.0-10.5 sec  6.62 MBytes  5.30 Mbits/sec

Removed the rules for two instances on the same host:

ubuntu@dom02:~$ sudo tc qdisc del dev vnet1 root
ubuntu@dom02:~$ sudo tc qdisc del dev vnet3 root
ubuntu@dom02:~$ sudo tc qdisc del dev vnet3 ingress
ubuntu@dom02:~$ sudo tc qdisc del dev vnet1 ingress
ubuntu@dom02:~$ tc -s qdisc ls dev vnet1
qdisc pfifo_fast 0: root refcnt 2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1
1 1 1 1
 Sent 7136572 bytes 1048 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0

And all of a sudden, those two instances are at blazing speeds:

ubuntu@testserver01:~$ iperf -s
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
[  4] local 10.1.1.101 port 5001 connected with 10.1.1.102 port 59322
[ ID] Interval       Transfer     Bandwidth
[  4]  0.0-10.0 sec  14.8 GBytes  12.7 Gbits/sec
[  5] local 10.1.1.101 port 5001 connected with 10.1.1.102 port 59329
[  5]  0.0-10.0 sec  19.1 GBytes  16.4 Gbits/sec
[  4] local 10.1.1.101 port 5001 connected with 10.1.1.102 port 59330
[  4]  0.0-10.0 sec  19.0 GBytes  16.3 Gbits/sec





On Sun, Aug 17, 2014 at 12:46 PM, Nick Burke <[email protected]> wrote:

> First,
>
> THANK YOU FOR REPLYING!
>
> Second, yes, it's currently set at 200.
>
> The compute offering for network is either blank (or when I tested it,
> 1000)
> The network offering for network limit is either 100, 1000, or blank.
>
>
> Those are the only network throttling parameters that I'm aware of, are
> there any others that I missed? Is it possible disk i/o is for some reason
> coming into play here?
>
> This happens regardless of if the instance network is either a virtual
> router or is directly connected to a vlan(ie, no virtual router) when two
> instances are directly connected to each other.
>
>
>
>
>
> On Sun, Aug 17, 2014 at 12:09 PM, ilya musayev <
> [email protected]> wrote:
>
>> Nick
>>
>> Have you checked network throttle settings in "global setting" and where
>> ever else it may be defined?
>>
>> regads
>> ilya
>>
>> On 8/17/14, 11:27 AM, Nick Burke wrote:
>>
>>> Update:
>>>
>>> After running nperf on same instances on the same virtual network, it
>>> looks
>>> like all instances can get no more than 2Mb/s. Additionally, it's
>>> sporadic
>>> and ranges from <1Mb/s, but never more than 2Mb/s:
>>>
>>> user@localhost:~$ iperf -c 10.1.0.1 -d
>>> ------------------------------------------------------------
>>> Server listening on TCP port 5001
>>> TCP window size: 85.3 KByte (default)
>>> ------------------------------------------------------------
>>> ------------------------------------------------------------
>>> Client connecting to 10.1.0.1, TCP port 5001
>>> TCP window size: 86.8 KByte (default)
>>> ------------------------------------------------------------
>>> [  5] local 10.1.0.10 port 50432 connected with 10.1.0.1 port 5001
>>> [ ID] Interval       Transfer     Bandwidth
>>> [  5]  0.0-11.0 sec  1.25 MBytes   950 Kbits/sec
>>> [  4] local 10.1.0.10 port 5001 connected with 10.1.0.1 port 53839
>>> [  4]  0.0-11.1 sec  2.50 MBytes  1.89 Mbits/sec
>>> user@localhost:~$ iperf -c 10.1.0.1 -d
>>> ------------------------------------------------------------
>>> Server listening on TCP port 5001
>>> TCP window size: 85.3 KByte (default)
>>> ------------------------------------------------------------
>>> ------------------------------------------------------------
>>> Client connecting to 10.1.0.1, TCP port 5001
>>> TCP window size: 50.3 KByte (default)
>>> ------------------------------------------------------------
>>> [  5] local 10.1.0.10 port 52248 connected with 10.1.0.1 port 5001
>>> [ ID] Interval       Transfer     Bandwidth
>>> [  5]  0.0-12.6 sec  1.25 MBytes   834 Kbits/sec
>>> [  4] local 10.1.0.10 port 5001 connected with 10.1.0.1 port 53840
>>> [  4]  0.0-11.9 sec  2.13 MBytes  1.49 Mbits/sec
>>>
>>>
>>>
>>> On Fri, Aug 15, 2014 at 11:40 AM, Nick Burke <[email protected]> wrote:
>>>
>>>  I upgraded from 4.0 to 4.3.0 some time ago. I didn't restart anything
>>>> and
>>>> it was all working great. However, I had to perform some maintenance and
>>>> had to restart everything. Now, I'm seeing packet loss on all virtuals,
>>>> even ones on the same host.
>>>>
>>>> sudo ping -c 500  -f 172.20.1.1
>>>> PING 172.20.1.1 (172.20.1.1) 56(84) bytes of data.
>>>> ........................................
>>>> --- 172.20.1.1 ping statistics ---
>>>> 500 packets transmitted, 460 received, 8% packet loss, time 864ms
>>>> rtt min/avg/max/mdev = 0.069/0.218/1.290/0.139 ms, ipg/ewma 1.731/0.328
>>>> ms
>>>>
>>>> No interface errors reported anywhere. The host itself isn't under load
>>>> at
>>>> all. Doesn't matter if the instance uses e1000 or virtio for the
>>>> drivers.
>>>> The only thing that I'm aware of that changed was that I had to reboot
>>>> all
>>>> the physical servers.
>>>>
>>>>
>>>> Could be related, but I was hit with the
>>>>
>>>> https://issues.apache.org/jira/browse/CLOUDSTACK-6464
>>>>
>>>> bug. I did follow with Marcus' suggestion:
>>>>
>>>>
>>>> *"This is a shot in the dark, but there have been some issues around
>>>>
>>>> upgrades that involve the cloud.vlan table expected contents changing.
>>>> New
>>>> 4.3 installs using vlan isolation don't seem to reproduce the issue.
>>>> I'll
>>>> see if I can reproduce anything like this with basic and/or non-vlan
>>>> isolated upgrades/installs. Can anyone experiencing an issue look at
>>>> their
>>>> database via something like "select * from cloud.vlan" and look at the
>>>> vlan_id. If you see something like "untagged" instead of
>>>> "vlan://untagged",
>>>> please try changing it and see if that helps."*
>>>>
>>>> --
>>>> Nick
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> *'What is a human being, then?' 'A seed' 'A... seed?' 'An acorn that is
>>>>
>>>> unafraid to destroy itself in growing into a tree.' -David Zindell, A
>>>> Requiem for Homo Sapiens*
>>>>
>>>>
>>>
>>>
>>
>
>
> --
> Nick
>
>
>
>
>
> *'What is a human being, then?' 'A seed' 'A... seed?' 'An acorn that is
> unafraid to destroy itself in growing into a tree.' -David Zindell, A
> Requiem for Homo Sapiens*
>



-- 
Nick





*'What is a human being, then?' 'A seed' 'A... seed?' 'An acorn that is
unafraid to destroy itself in growing into a tree.' -David Zindell, A
Requiem for Homo Sapiens*

Re: intermittent packet loss after upgrading and restarting networks

Reply via email to