Hi Alex,
Comparing version b6a3dd9cca (Nov 22) to 64bb477f05 (6 Oct) memory still
grows up, but much slower. On production env it was 400MB/hour, and it
is now (64bb477f05) 100MB/hour.
Python flooding script is not a way to generate the problem, it shows
different behaviours on production and testing environment.
When run on production env, the memory grows order of magnitude faster.
However, we still see growth even without flooding, which you can find
below.
Example growth of exactly 264KB or 2x264KB increments every few seconds
from our production environment, which had about 1k pps at the moment
(normal production traffic, without flooding):
# while true; do echo "`date '+%T'`: `ps -Ao 'rsz,cmd' --sort rsz | tail
-n 1 | cut -c -20;`"; sleep 1; done
10:50:51: 216788 ovs-vswitchd
10:50:52: 216788 ovs-vswitchd
10:50:53: 216788 ovs-vswitchd
10:50:55: 216788 ovs-vswitchd
10:50:56: 216788 ovs-vswitchd
10:50:57: 217052 ovs-vswitchd
10:50:58: 217052 ovs-vswitchd
10:50:59: 217052 ovs-vswitchd
10:51:00: 217052 ovs-vswitchd
10:51:01: 217052 ovs-vswitchd
10:51:02: 217052 ovs-vswitchd
10:51:03: 217052 ovs-vswitchd
10:51:04: 217052 ovs-vswitchd
10:51:05: 217052 ovs-vswitchd
10:51:06: 217580 ovs-vswitchd
10:51:07: 217580 ovs-vswitchd
10:51:09: 217580 ovs-vswitchd
10:51:10: 217580 ovs-vswitchd
10:51:11: 217580 ovs-vswitchd
10:51:12: 217844 ovs-vswitchd
10:51:13: 217844 ovs-vswitchd
10:51:14: 217844 ovs-vswitchd
10:51:15: 217844 ovs-vswitchd
10:51:16: 217844 ovs-vswitchd
10:51:17: 217844 ovs-vswitchd
What is also specific:
We use only OpenFlow1.0 controller.
Running `ovs-vsctl list Flow_Table` gives empty output.
Best,
Adam
W dniu 03.12.2014 o 12:14, Adam Mazur pisze:
I will try on current head version.
Meanwhile, answers are below.
W dniu 02.12.2014 o 23:24, Alex Wang pisze:
Hey Adam,
Besides the questions just asked,
On Tue, Dec 2, 2014 at 1:11 PM, Alex Wang <al...@nicira.com
<mailto:al...@nicira.com>> wrote:
Hey Adam,
Did you use any trick to avoid the arp resolution?
Running your script on my setup causes only arp pkts sent,
Also, there is no change of mem util of ovs.
There is no trick with arp.
Gateway for VM acts as a "normal" router, with old ovs 1.7.
The router IS a bottleneck, while it consumes 100% of CPU. But in the
same time ovs 2.3 on the hypervisor consumes 400% of CPU and grows in RSS.
One more thing, did you see the issue without tunnel?
This very recent commit fixes some issue about tunneling,
Could you try again with it?
I will try. These problems was seen on b6a3dd9cca (Nov 22), will try
on head version.
commit b772066ffd066d59d9ebce092f6665150723d2ad
Author: Pravin B Shelar <pshe...@nicira.com
<mailto:pshe...@nicira.com>>
Date: Wed Nov 26 11:27:05 2014 -0800
route-table: Remove Unregister.
Since dpif registering for routing table at initialization
there is no need to unregister it. Following patch removes
support for turning routing table notifications on and off.
Due to this change OVS always listens for these
notifications.
Reported-by: YAMAMOTO Takashi <yamam...@valinux.co.jp
<mailto:yamam...@valinux.co.jp>>
Signed-off-by: Pravin B Shelar <pshe...@nicira.com
<mailto:pshe...@nicira.com>>
Acked-by: YAMAMOTO Takashi <yamam...@valinux.co.jp
<mailto:yamam...@valinux.co.jp>>
Want to ask more questions to help debug:
1. Could you post the 'ovs-vsctl show' output on the xenserver?
http://pastebin.com/pe8YpRwr
2. could you post the 'ovs-dpctl dump-flows' output during the run of
script?
Partial output - head: http://pastebin.com/fUkbfeUN and tail:
http://pastebin.com/P1QgyH02
Full output got more than 100MB of text when flooding 400K pps. Would
you like gzipped on priv? (less than 1MB)
3. if oom is activated, you should see the oom log from syslog or dmeg
output, could you provide it?
Don't have one - production logs has been rotated, remote logs during
oom was unavailable (network was dead while vswitch has been
starting), testing environment is too slow to fast generate oom...
first (and much faster) I will try on the head version as you have
said there was fixes for such case.
4. could you provide the route output on the hypervisor
# route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref
Use Iface
0.0.0.0 10.2.7.1 0.0.0.0 UG 0 0 0
xenbr0
10.2.7.0 0.0.0.0 255.255.255.0 U 0 0 0 xenbr0
10.30.7.0 0.0.0.0 255.255.255.0 U 0 0 0 ib0
37.233.99.0 0.0.0.0 255.255.255.0 U 0 0 0 xapi4
Thanks,
Alex Wang,
Thanks,
Alex Wang,
On Mon, Dec 1, 2014 at 2:43 AM, Adam Mazur
<adam.ma...@tiktalik.com <mailto:adam.ma...@tiktalik.com>> wrote:
Hi,
We are testing on kernel 3.18, ovs current master, gre
tunnels / xen server. Following python script leads to fast
ovs-vswitchd memory grow (1GB / minute) and finally OOM kill:
import random, socket, struct, time
sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
while True:
ip_raw = struct.pack('>I', random.randint(1, 0xffffffff))
ip = socket.inet_ntoa(ip_raw)
try:
sock.sendto("123", (ip, 12345))
except:
pass
#time.sleep(0.001)
During this test ovs did not show growing flow number, but
memory still grows.
If packets are sent too slow, then memory never grows -
uncomment time.sleep line above.
Best,
Adam
_______________________________________________
discuss mailing list
discuss@openvswitch.org <mailto:discuss@openvswitch.org>
http://openvswitch.org/mailman/listinfo/discuss
_______________________________________________
discuss mailing list
discuss@openvswitch.org
http://openvswitch.org/mailman/listinfo/discuss