Hi!

On 17.02.2025 08:33, Ales Musil wrote:


On Fri, Feb 14, 2025 at 3:57 PM Piotr Misiak via discuss <ovs-discuss@openvswitch.org> wrote:

    Hi,


Hi Piotr,

thank you for contacting us.


    We are running several OpenStack/OVN regions with different sizes.
    All of them have external networks connected to the Internet.
    We are receiving a lot of packets to non used  (non provisioned)
    destination IP addresses, I guess some bots scanning Internet.
    This creates a lot of ARP requests which cannot be replied, because
    those IP addresses are not configured anywhere yet.

    Few days ago we upgraded one of our regions from OVN 22.09 to OVN
    24.03 and basically we suddenly started having critical issues with
    DNS resolving on VMs running in the OpenStack.
    Generally non of DNS requests were successful, some of them was going
    back after 5 minutes, sometimes even after 30 minutes. Yes, minutes
    not seconds.

Slightly related, there was recently an improvement to DNS handling
where the cache is no longer processed by pinctrl only [0], then
later on there was another addition to avoid mutex contention as
much as possible [1]. I believe that both of those would help in
your case to some extent.


I think those changes would help us in the future, but for now we decided to disable UDP interception.



    After some debugging we identified problematic OpenFlow flows which
    send ARP request packets to ovn-controllers.
    Those flows are created because we have around 400 ports in the
    external-network and packet flooding flow have to be splitted.
    Those flows are installed at the beginning of OF 39 table with
    priority 110 which includes 170 resubmits:


Those flows are related to multicast groups, in this case the "_MC_flood".


    cookie=0x28ef9c32, duration=829.596s, table=39, n_packets=117482,
    n_bytes=4947460, idle_age=0, hard_age=58,
    priority=110,reg6=0x9001,reg15=0x8000,metadata=0xba
    
actions=load:0->NXM_NX_REG6[],load:0x5a3->NXM_NX_REG15[],resubmit(,41),load:0x21af->NXM_NX_REG15[],resubmit(,41),load:0x8f->NXM_NX_REG15[],resubmit(,41),load:0x1374->NXM_NX_REG15[],resubmit(,41),load:0x5f->NXM_NX_REG15[],resubmit(,41),load:0x10b->NXM_NX_REG15[],resubmit(,41),load:0x106->NXM_NX_REG15[],resubmit(,41),load:0x13d9->NXM_NX_REG15[],resubmit(,41),load:0x4d->NXM_NX_REG15[],resubmit(,41),load:0x2202->NXM_NX_REG15[],resubmit(,41),load:0xb4->NXM_NX_REG15[],resubmit(,41),load:0x25ed->NXM_NX_REG15[],resubmit(,41),load:0x1b59->NXM_NX_REG15[],resubmit(,41),load:0x26b2->NXM_NX_REG15[],resubmit(,41),load:0x6a->NXM_NX_REG15[],resubmit(,41)
    <<< CUT >>>
    
load:0x169a->NXM_NX_REG15[],resubmit(,41),controller(userdata=00.00.00.1b.00.00.00.00.00.00.90.01.00.00.80.00.27)

    there is also second rule with 170 resubmits with controller() at
    the end:
    controller(userdata=00.00.00.1b.00.00.00.00.00.00.90.02.00.00.80.00.27)

    and also third rule with smaller number of resubmits without
    controller. In total we have around 400 resubmits.

    This was introduced in 24.03 version by this commit:
    
https://github.com/ovn-org/ovn/commit/325c7b203d8bfd12bc1285ad11390c1a55cd6717

    What we see in the ovn-controller logs:

    2025-02-12T20:35:41.490Z|10791|pinctrl(ovn_pinctrl0)|DBG|pinctrl
    received  packet-in | opcode=unrecognized(27)| OF_Table_ID=39|
    OF_Cookie_ID=0x28ef9c32| in-port=60| src-mac=4e:15:bc:ac:36:45,
    dst-mac=ff:ff:ff:ff:ff:ff| src-ip=A.A.A.A, dst-ip=B.B.B.B
    2025-02-12T20:35:41.500Z|10792|pinctrl(ovn_pinctrl0)|DBG|pinctrl
    received  packet-in | opcode=unrecognized(27)| OF_Table_ID=39|
    OF_Cookie_ID=0x28ef9c32| in-port=65533| src-mac=4e:15:bc:ac:36:45,
    dst-mac=ff:ff:ff:ff:ff:ff| src-ip=A.A.A.A, dst-ip=B.B.B.B

    as you can see the same packet is looped thru the ovn-controller
    twice. It's because we have 400 ports and this is covered by three
    OpenFlow flows.

    The funny thing is that those packets are dropped at the end of
    OpenFlow table chain in the datapath. So they kill our ovn-controllers
    performance to be finally dropped.
    I'm including a small part of packet trace result here:

    39. reg15=0x8000,metadata=0xba, priority 100, cookie 0x28ef9c32
        set_field:0->reg6
        set_field:0xe8->reg15
        resubmit(,41)
        41. priority 0
                set_field:0->reg0
                set_field:0->reg1
                set_field:0->reg2
                set_field:0->reg3
                set_field:0->reg4
                set_field:0->reg5
                set_field:0->reg6
                set_field:0->reg7
                set_field:0->reg8
                set_field:0->reg9
                resubmit(,42)
            42. metadata=0xba, priority 0, cookie 0x3372823b
                resubmit(,43)
            43. metadata=0xba,dl_dst=01:00:00:00:00:00/01:00:00:00:00:00,
    priority 110, cookie 0xaabcf4fa
                resubmit(,44)
            44. metadata=0xba, priority 0, cookie 0x9b7d541f
                resubmit(,45)
            45. metadata=0xba, priority 65535, cookie 0xedb6d3de
                resubmit(,46)
            46. metadata=0xba, priority 65535, cookie 0x1dbceae
                resubmit(,47)
            47. metadata=0xba, priority 0, cookie 0xc1c2a264
                resubmit(,48)
            48. metadata=0xba, priority 0, cookie 0x640d65ba
                resubmit(,49)
            49. metadata=0xba, priority 0, cookie 0x78f2abc0
                resubmit(,50)
            50. metadata=0xba, priority 0, cookie 0x7b63c11c
                resubmit(,51)
            51. metadata=0xba,dl_dst=01:00:00:00:00:00/01:00:00:00:00:00,
    priority 100, cookie 0xb055fd1c
    set_field:0/0x8000000000000000000000000000->xxreg0
                resubmit(,52)
            52. metadata=0xba, priority 0, cookie 0x4dd5d603
                resubmit(,64)
            64. priority 0
                resubmit(,65)
            65. reg15=0xe8,metadata=0xba, priority 100, cookie 0xfab6eb
    
clone(ct_clear,set_field:0->reg11,set_field:0->reg12,set_field:0/0xffff->reg13,set_field:0x25b->reg11,set_field:0x30a->reg12,set_field:0x252->metadata,set_field:0x1->reg14,set_field:0->reg10,set_field:0->reg15,set_field:0->reg0,set_field:0->reg1,set_field:0->reg2,set_field:0->reg3,set_field:0->reg4,set_field:0->reg5,set_field:0->reg6,set_field:0->reg7,set_field:0->reg8,set_field:0->reg9,resubmit(,8))
                ct_clear
                set_field:0->reg11
                set_field:0->reg12
                set_field:0/0xffff->reg13
                set_field:0x25b->reg11
                set_field:0x30a->reg12
                set_field:0x252->metadata
                set_field:0x1->reg14
                set_field:0->reg10
                set_field:0->reg15
                set_field:0->reg0
                set_field:0->reg1
                set_field:0->reg2
                set_field:0->reg3
                set_field:0->reg4
                set_field:0->reg5
                set_field:0->reg6
                set_field:0->reg7
                set_field:0->reg8
                set_field:0->reg9
                resubmit(,8)
             8.
    reg14=0x1,metadata=0x252,dl_dst=01:00:00:00:00:00/01:00:00:00:00:00,
    priority 50, cookie 0x33587607
    
set_field:0xfa163e9f2f460000000000000000/0xffffffffffff0000000000000000->xxreg0
                resubmit(,9)
             9. metadata=0x252, priority 0, cookie 0x671d3d97
                set_field:0x4/0x4->xreg4
                resubmit(,10)
            10. reg9=0x4/0x4,metadata=0x252, priority 100, cookie
    0xd21e0659
                resubmit(,79)
                79. reg0=0x2, priority 0
                        drop
                resubmit(,11)
            11. arp,metadata=0x252, priority 85, cookie 0xb5758416
                drop


    What we can do to improve those ARP packets handling to not to send
    them to ovn-controllers?


I'm not sure if there is a way to not send them to ovn-controller when
the multicast group is large.

    Maybe they can be dropped somewhere earlier in the table chain? They
    are requesting a MAC address which OVN doesn't know. Why it tries to
    flood it to all router ports in the external network?



At this point in the pipeline OVN doesn't know that this IP/MAC is
unknown. And because the packet is multicast one OVN basically does
what a normal network would do, flood it to all ports on the switch.

    Maybe we can implement this "too big" OpenFlow rule in a different way
    and loop it inside the fast datapath, if possible?


Unfortunately not, OvS would still try to fit it into a single buffer
it doesn't matter if it's one long action or multiple resubmits.
Unless there is a action that needs to be executed before
continuation e.g. controller action, we would still have the issue
that the commit tried to fix.

    I also noticed that IPv6 NS packets are processed via ovn-controller.
    Why OVS can't create responses inside the fast datapath in a similar
    way it creates responses to the ARP requests for known MACs?


This is a known limitation of OvS, there was an attempt to make it
work, however it didn't lead anywhere [2]. We should probably try
to revisit this. Once there is OvS support we could easily change
it in OVN to do it directly as we do for ARP.


    This issue had a big influence on our cloud, because the same
    ovn-controller thread is responsible for DHCP, DNS interception, IPv6
    NS packets and when they were overloaded all those services were not
    working.

    Another thing, quite misleading, are those "opcode=unrecognized(27)"
    in the ovn-controller log, which are unrecognized only because I guess
    the mentioned commit haven't added new action name mapping somewhere
    here:
    
https://github.com/ovn-org/ovn/blob/ed2790153c07a376890f28b0a16bc321e3af016b/lib/actions.c#L5977


Good catch, we might be actually missing more of those looking at it.



    To recover our region we disabled the DNS interception and lowered
    number of ARP requests by increasing
    "net.ipv4.neigh.default.retrans_time_ms" on our upstream gateways.
    Those changes lowered number of packets sent to ovn-controllers from
    around 500 p/s to 200 p/s and stabilized our region.
    Nevertheless this OVN performance issue is still there.


If I may suggest another potential mitigation might be to add
stateless ACL that will ensure the ARP packets are dropped before
reaching the flood flows. Would that be an option? This would really
be just mitigation until we have a proper solution. Speaking about
proper solution, given the need for this, the proper solution would
probably be CoPP for this controller action so we don't end up with
overloaded pinctrl thread. There is a downside to CoPP as we might
drop legitimate packets that need flooding.


Looks like for now setting other_config:broadcast-arps-to-all-routers=false option will help us a lot, not only with the described APR packets looping via ovn-controllers, but also with the 4096 resubmit limit we experience from time to time.
I have to test it more, but first tests looks promising.


Thank you,

Piotr


    Thanks for your attention,
    Piotr Misiak
    _______________________________________________
    discuss mailing list
    disc...@openvswitch.org
    https://mail.openvswitch.org/mailman/listinfo/ovs-discuss



Thanks,
Ales

[0] https://github.com/ovn-org/ovn/commit/817d4e53
[1] https://github.com/ovn-org/ovn/commit/eba60b27
<https://github.com/ovn-org/ovn/commit/eba60b27>[2] http://patchwork.ozlabs.org/project/openvswitch/patch/20200928134947.48269-1-fankaixi...@bytedance.com

_______________________________________________
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to