Hey Adam, Sorry for the delayed reply, I have digressed by some issue,
Could I know the xen server version? I do not see any mem grow on kvm, so I'd like to test using the same xen version. Thanks, Alex Wang, On Fri, Dec 5, 2014 at 2:03 AM, Adam Mazur <adam.ma...@tiktalik.com> wrote: > Hi Alex, > > Comparing version b6a3dd9cca (Nov 22) to 64bb477f05 (6 Oct) memory still > grows up, but much slower. On production env it was 400MB/hour, and it is > now (64bb477f05) 100MB/hour. > > Python flooding script is not a way to generate the problem, it shows > different behaviours on production and testing environment. > When run on production env, the memory grows order of magnitude faster. > However, we still see growth even without flooding, which you can find > below. > > Example growth of exactly 264KB or 2x264KB increments every few seconds > from our production environment, which had about 1k pps at the moment > (normal production traffic, without flooding): > > # while true; do echo "`date '+%T'`: `ps -Ao 'rsz,cmd' --sort rsz | tail > -n 1 | cut -c -20;`"; sleep 1; done > 10:50:51: 216788 ovs-vswitchd > 10:50:52: 216788 ovs-vswitchd > 10:50:53: 216788 ovs-vswitchd > 10:50:55: 216788 ovs-vswitchd > 10:50:56: 216788 ovs-vswitchd > 10:50:57: 217052 ovs-vswitchd > 10:50:58: 217052 ovs-vswitchd > 10:50:59: 217052 ovs-vswitchd > 10:51:00: 217052 ovs-vswitchd > 10:51:01: 217052 ovs-vswitchd > 10:51:02: 217052 ovs-vswitchd > 10:51:03: 217052 ovs-vswitchd > 10:51:04: 217052 ovs-vswitchd > 10:51:05: 217052 ovs-vswitchd > 10:51:06: 217580 ovs-vswitchd > 10:51:07: 217580 ovs-vswitchd > 10:51:09: 217580 ovs-vswitchd > 10:51:10: 217580 ovs-vswitchd > 10:51:11: 217580 ovs-vswitchd > 10:51:12: 217844 ovs-vswitchd > 10:51:13: 217844 ovs-vswitchd > 10:51:14: 217844 ovs-vswitchd > 10:51:15: 217844 ovs-vswitchd > 10:51:16: 217844 ovs-vswitchd > 10:51:17: 217844 ovs-vswitchd > > > What is also specific: > We use only OpenFlow1.0 controller. > Running `ovs-vsctl list Flow_Table` gives empty output. > > Best, > Adam > > > W dniu 03.12.2014 o 12:14, Adam Mazur pisze: > > I will try on current head version. > Meanwhile, answers are below. > > > W dniu 02.12.2014 o 23:24, Alex Wang pisze: > > Hey Adam, > > Besides the questions just asked, > > On Tue, Dec 2, 2014 at 1:11 PM, Alex Wang <al...@nicira.com> wrote: > >> Hey Adam, >> >> Did you use any trick to avoid the arp resolution? >> >> Running your script on my setup causes only arp pkts sent, >> >> Also, there is no change of mem util of ovs. >> > > There is no trick with arp. > Gateway for VM acts as a "normal" router, with old ovs 1.7. > The router IS a bottleneck, while it consumes 100% of CPU. But in the same > time ovs 2.3 on the hypervisor consumes 400% of CPU and grows in RSS. > > > One more thing, did you see the issue without tunnel? >> This very recent commit fixes some issue about tunneling, >> Could you try again with it? >> > > I will try. These problems was seen on b6a3dd9cca (Nov 22), will try on > head version. > > commit b772066ffd066d59d9ebce092f6665150723d2ad >> Author: Pravin B Shelar <pshe...@nicira.com> >> Date: Wed Nov 26 11:27:05 2014 -0800 >> >> route-table: Remove Unregister. >> >> Since dpif registering for routing table at initialization >> there is no need to unregister it. Following patch removes >> support for turning routing table notifications on and off. >> Due to this change OVS always listens for these >> notifications. >> >> Reported-by: YAMAMOTO Takashi <yamam...@valinux.co.jp> >> Signed-off-by: Pravin B Shelar <pshe...@nicira.com> >> Acked-by: YAMAMOTO Takashi <yamam...@valinux.co.jp> >> >> > > > Want to ask more questions to help debug: > > 1. Could you post the 'ovs-vsctl show' output on the xenserver? > > > http://pastebin.com/pe8YpRwr > > 2. could you post the 'ovs-dpctl dump-flows' output during the run of > script? > > > Partial output - head: http://pastebin.com/fUkbfeUN and tail: > http://pastebin.com/P1QgyH02 > Full output got more than 100MB of text when flooding 400K pps. Would you > like gzipped on priv? (less than 1MB) > > 3. if oom is activated, you should see the oom log from syslog or dmeg > output, could you provide it? > > > Don't have one - production logs has been rotated, remote logs during oom > was unavailable (network was dead while vswitch has been starting), testing > environment is too slow to fast generate oom... first (and much faster) I > will try on the head version as you have said there was fixes for such case. > > 4. could you provide the route output on the hypervisor > > > # route -n > Kernel IP routing table > Destination Gateway Genmask Flags Metric Ref Use > Iface > 0.0.0.0 10.2.7.1 0.0.0.0 UG 0 0 0 > xenbr0 > 10.2.7.0 0.0.0.0 255.255.255.0 U 0 0 0 > xenbr0 > 10.30.7.0 0.0.0.0 255.255.255.0 U 0 0 0 ib0 > 37.233.99.0 0.0.0.0 255.255.255.0 U 0 0 0 > xapi4 > > > > Thanks, > Alex Wang, > > > > > >> Thanks, >> Alex Wang, >> >> On Mon, Dec 1, 2014 at 2:43 AM, Adam Mazur <adam.ma...@tiktalik.com> >> wrote: >> >>> Hi, >>> >>> We are testing on kernel 3.18, ovs current master, gre tunnels / xen >>> server. Following python script leads to fast ovs-vswitchd memory grow (1GB >>> / minute) and finally OOM kill: >>> >>> >>> import random, socket, struct, time >>> sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM) >>> while True: >>> ip_raw = struct.pack('>I', random.randint(1, 0xffffffff)) >>> ip = socket.inet_ntoa(ip_raw) >>> try: >>> sock.sendto("123", (ip, 12345)) >>> except: >>> pass >>> #time.sleep(0.001) >>> >>> >>> During this test ovs did not show growing flow number, but memory still >>> grows. >>> >>> If packets are sent too slow, then memory never grows - uncomment >>> time.sleep line above. >>> >>> Best, >>> Adam >>> _______________________________________________ >>> discuss mailing list >>> discuss@openvswitch.org >>> http://openvswitch.org/mailman/listinfo/discuss >>> >> >> > > >
_______________________________________________ discuss mailing list discuss@openvswitch.org http://openvswitch.org/mailman/listinfo/discuss