Re: [ovs-discuss] OVS 2.3 udp flood - vswitchd OOM

Adam Mazur Tue, 09 Dec 2014 09:34:44 -0800

Hi Alex,

I'm using xen 4.3, directly as-is from debian-wheezy distribution.
Kernel 3.17.3.


Thanks,
Adam


W dniu 05.12.2014 o 17:51, Alex Wang pisze:

Hey Adam,

Sorry for the delayed reply, I have digressed by some issue,

Could I know the xen server version?  I do not see any mem grow
on kvm,  so I'd like to test using the same xen version.

Thanks,
Alex Wang,

On Fri, Dec 5, 2014 at 2:03 AM, Adam Mazur <adam.ma...@tiktalik.com<mailto:adam.ma...@tiktalik.com>> wrote:


    Hi Alex,

    Comparing version b6a3dd9cca (Nov 22) to 64bb477f05 (6 Oct) memory
    still grows up, but much slower. On production env it was
    400MB/hour, and it is now (64bb477f05) 100MB/hour.

    Python flooding script is not a way to generate the problem, it
    shows different behaviours on production and testing environment.
    When run on production env, the memory grows order of magnitude
    faster. However, we still see growth even without flooding, which
    you can find below.

    Example growth of exactly 264KB or 2x264KB increments every few
    seconds from our production environment, which had about 1k pps at
    the moment (normal production traffic, without flooding):

    # while true; do echo "`date '+%T'`: `ps -Ao 'rsz,cmd' --sort rsz
    | tail -n 1 | cut -c -20;`"; sleep 1; done
    10:50:51: 216788 ovs-vswitchd
    10:50:52: 216788 ovs-vswitchd
    10:50:53: 216788 ovs-vswitchd
    10:50:55: 216788 ovs-vswitchd
    10:50:56: 216788 ovs-vswitchd
    10:50:57: 217052 ovs-vswitchd
    10:50:58: 217052 ovs-vswitchd
    10:50:59: 217052 ovs-vswitchd
    10:51:00: 217052 ovs-vswitchd
    10:51:01: 217052 ovs-vswitchd
    10:51:02: 217052 ovs-vswitchd
    10:51:03: 217052 ovs-vswitchd
    10:51:04: 217052 ovs-vswitchd
    10:51:05: 217052 ovs-vswitchd
    10:51:06: 217580 ovs-vswitchd
    10:51:07: 217580 ovs-vswitchd
    10:51:09: 217580 ovs-vswitchd
    10:51:10: 217580 ovs-vswitchd
    10:51:11: 217580 ovs-vswitchd
    10:51:12: 217844 ovs-vswitchd
    10:51:13: 217844 ovs-vswitchd
    10:51:14: 217844 ovs-vswitchd
    10:51:15: 217844 ovs-vswitchd
    10:51:16: 217844 ovs-vswitchd
    10:51:17: 217844 ovs-vswitchd


    What is also specific:
    We use only OpenFlow1.0 controller.
    Running `ovs-vsctl list Flow_Table` gives empty output.

    Best,
    Adam


    W dniu 03.12.2014 o 12:14, Adam Mazur pisze:

    I will try on current head version.
    Meanwhile, answers are below.


    W dniu 02.12.2014 o 23:24, Alex Wang pisze:

    Hey Adam,

    Besides the questions just asked,

    On Tue, Dec 2, 2014 at 1:11 PM, Alex Wang <al...@nicira.com
    <mailto:al...@nicira.com>> wrote:

        Hey Adam,

        Did you use any trick to avoid the arp resolution?

        Running your script on my setup causes only arp pkts sent,

        Also, there is no change of mem util of ovs.


    There is no trick with arp.
    Gateway for VM acts as a "normal" router, with old ovs 1.7.
    The router IS a bottleneck, while it consumes 100% of CPU. But in
    the same time ovs 2.3 on the hypervisor consumes 400% of CPU and
    grows in RSS.

        One more thing, did you see the issue without tunnel?
        This very recent commit fixes some issue about tunneling,
        Could you try again with it?


    I will try. These problems was seen on b6a3dd9cca (Nov 22), will
    try on head version.

        commit b772066ffd066d59d9ebce092f6665150723d2ad
        Author: Pravin B Shelar <pshe...@nicira.com
        <mailto:pshe...@nicira.com>>
        Date:   Wed Nov 26 11:27:05 2014 -0800

            route-table: Remove Unregister.
            Since dpif registering for routing table at initialization
            there is no need to unregister it. Following patch removes
            support for turning routing table notifications on and off.
            Due to this change OVS always listens for these
            notifications.
            Reported-by: YAMAMOTO Takashi <yamam...@valinux.co.jp
        <mailto:yamam...@valinux.co.jp>>
            Signed-off-by: Pravin B Shelar <pshe...@nicira.com
        <mailto:pshe...@nicira.com>>
            Acked-by: YAMAMOTO Takashi <yamam...@valinux.co.jp
        <mailto:yamam...@valinux.co.jp>>




     Want to ask more questions to help debug:

    1. Could you post the 'ovs-vsctl show' output on the xenserver?


    http://pastebin.com/pe8YpRwr

    2. could you post the 'ovs-dpctl dump-flows' output during the
    run of script?


    Partial output - head: http://pastebin.com/fUkbfeUN and tail:
    http://pastebin.com/P1QgyH02
    Full output got more than 100MB of text when flooding 400K pps.
    Would you like gzipped on priv? (less than 1MB)

    3. if oom is activated, you should see the oom log from syslog
    or dmeg
    output, could you provide it?


    Don't have one - production logs has been rotated, remote logs
    during oom was unavailable (network was dead while vswitch has
    been starting), testing environment is too slow to fast generate
    oom... first (and much faster) I will try on the head version as
    you have said there was fixes for such case.

    4. could you provide the route output on the hypervisor


    # route -n
    Kernel IP routing table
    Destination     Gateway Genmask         Flags Metric Ref    Use Iface
    0.0.0.0         10.2.7.1 0.0.0.0         UG    0      0        0
    xenbr0
    10.2.7.0        0.0.0.0 255.255.255.0   U     0      0        0
    xenbr0
    10.30.7.0       0.0.0.0 255.255.255.0   U     0      0        0 ib0
    37.233.99.0     0.0.0.0 255.255.255.0   U     0      0        0 xapi4


    Thanks,
    Alex Wang,



        Thanks,
        Alex Wang,

        On Mon, Dec 1, 2014 at 2:43 AM, Adam Mazur
        <adam.ma...@tiktalik.com <mailto:adam.ma...@tiktalik.com>>
        wrote:

            Hi,

            We are testing on kernel 3.18, ovs current master, gre
            tunnels / xen server. Following python script leads to
            fast ovs-vswitchd memory grow (1GB / minute) and finally
            OOM kill:


            import random, socket, struct, time
            sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
            while True:
                ip_raw = struct.pack('>I', random.randint(1,
            0xffffffff))
                ip = socket.inet_ntoa(ip_raw)
                try:
                    sock.sendto("123", (ip, 12345))
                except:
                    pass
                #time.sleep(0.001)


            During this test ovs did not show growing flow number,
            but memory still grows.

            If packets are sent too slow, then memory never grows -
            uncomment time.sleep line above.

            Best,
            Adam
            _______________________________________________
            discuss mailing list
            discuss@openvswitch.org <mailto:discuss@openvswitch.org>
            http://openvswitch.org/mailman/listinfo/discuss

_______________________________________________
discuss mailing list
discuss@openvswitch.org
http://openvswitch.org/mailman/listinfo/discuss

Re: [ovs-discuss] OVS 2.3 udp flood - vswitchd OOM

Reply via email to