Hi Dumitru,
Sure, let's give it a try. Is it good idea to apply this patch on top of the patch you previously sent to try? What topology for testing is more preferrable to try it out? L2 or L3 (bgp-based networking)? In case of L3 there is actually small amount of ARP pps compared to L2 where uplink interface (like eno1) is member of br-ex bridge.

Thank you for your involvement, really appreciate it!

01.04.2025 15:06, Dumitru Ceara wrote:
On 4/1/25 3:15 AM, Ilia Baikov wrote:
Hello,
Hi Ilia,

So the things go way deeper and it becomes way strange as i initially
thought.
I've migrated to L3 networking using ovn-bgp-agent, in order to reduce
ARP packets flooded over all ports attached to br-int. However this
didn't help at all and some VMs loses external connectivity (but VMs
ports didn't get flooded by hundreds pps of ARP).

2025-04-01T01:03:51.368Z|00035|ofproto_dpif_xlate(handler12)|WARN|
Dropped 131 log messages in last 55 seconds (most recently, 10 seconds
ago) due to excessive rate
2025-04-01T01:03:51.368Z|00036|ofproto_dpif_xlate(handler12)|WARN|over
4096 resubmit actions on bridge br-int while processing
udp,in_port=473,vlan_tci=0x0000,dl_src=fa:16:3e:66:0c:6e,dl_dst=ff:ff:ff:ff:ff:ff,nw_src=83.217.210.206,nw_dst=255.255.255.255,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=5678,tp_dst=5678
2025-04-01T01:04:35.392Z|00054|ofproto_dpif_upcall(handler60)|WARN|
Dropped 133 log messages in last 57 seconds (most recently, 13 seconds
ago) due to excessive rate
2025-04-01T01:04:35.392Z|00055|ofproto_dpif_upcall(handler60)|WARN|Flow:
udp,in_port=277,vlan_tci=0x0000,dl_src=fa:16:3e:24:f1:f7,dl_dst=ff:ff:ff:ff:ff:ff,nw_src=83.217.210.148,nw_dst=83.217.210.255,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=137,tp_dst=137

bridge("br-int")
----------------
  0. in_port=277, priority 100, cookie 0xa70f8aad
     set_field:0x5d/0xffff->reg13
     set_field:0x2->reg11
     set_field:0x3->reg12
     set_field:0x2->metadata
     set_field:0x2ac->reg14
     set_field:0/0xffff0000->reg13
     resubmit(,8)
  8. metadata=0x2, priority 50, cookie 0xecab0c71
     set_field:0/0x1000->reg10
     resubmit(,73)
     73. reg14=0x2ac,metadata=0x2, priority 80, cookie 0xa70f8aad
             set_field:0x1000/0x1000->reg10
     move:NXM_NX_REG10[12]->NXM_NX_XXREG0[111]
      -> NXM_NX_XXREG0[111] is now 0x1
     resubmit(,9)
  9. reg0=0x8000/0x8000,metadata=0x2, priority 50, cookie 0x1a158a65
     drop

Final flow:
udp,reg0=0x8000,reg10=0x1000,reg11=0x2,reg12=0x3,reg13=0x5d,reg14=0x2ac,metadata=0x2,in_port=277,vlan_tci=0x0000,dl_src=fa:16:3e:24:f1:f7,dl_dst=ff:ff:ff:ff:ff:ff,nw_src=83.217.210.148,nw_dst=83.217.210.255,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=137,tp_dst=137
Megaflow:
recirc_id=0,eth,ip,in_port=277,dl_src=fa:16:3e:24:f1:f7,nw_frag=no
Datapath actions: drop
2025-04-01T01:04:48.434Z|00015|ofproto_dpif_xlate(handler58)|WARN|
Dropped 7 log messages in last 55 seconds (most recently, 13 seconds
ago) due to excessive rate
2025-04-01T01:04:48.435Z|00016|ofproto_dpif_xlate(handler58)|WARN|over
4096 resubmit actions on bridge br-int while processing
arp,in_port=487,vlan_tci=0x0000,dl_src=fa:16:3e:de:2a:ce,dl_dst=ff:ff:ff:ff:ff:ff,arp_spa=185.255.178.119,arp_tpa=185.255.178.1,arp_op=1,arp_sha=fa:16:3e:de:2a:ce,arp_tha=00:00:00:00:00:00
2025-04-01T01:05:51.377Z|00017|ofproto_dpif_xlate(handler46)|WARN|
Dropped 7 log messages in last 61 seconds (most recently, 29 seconds
ago) due to excessive rate
2025-04-01T01:05:51.377Z|00018|ofproto_dpif_xlate(handler46)|WARN|over
4096 resubmit actions on bridge br-int while processing
udp,in_port=473,vlan_tci=0x0000,dl_src=fa:16:3e:66:0c:6e,dl_dst=ff:ff:ff:ff:ff:ff,nw_src=83.217.210.206,nw_dst=255.255.255.255,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=5678,tp_dst=5678
2025-04-01T01:05:51.377Z|00019|ofproto_dpif_upcall(handler46)|WARN|
Dropped 8 log messages in last 63 seconds (most recently, 29 seconds
ago) due to excessive rate
2025-04-01T01:05:51.377Z|00020|ofproto_dpif_upcall(handler46)|WARN|Flow:
udp,in_port=278,vlan_tci=0x0000,dl_src=fa:16:3e:66:0c:6e,dl_dst=ff:ff:ff:ff:ff:ff,nw_src=83.217.210.206,nw_dst=255.255.255.255,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=5678,tp_dst=5678

bridge("br-int")
----------------
  0. in_port=278, priority 100, cookie 0xd191f4de
     set_field:0x5e/0xffff->reg13
     set_field:0x2->reg11
     set_field:0x3->reg12
     set_field:0x2->metadata
     set_field:0x33d->reg14
     set_field:0/0xffff0000->reg13
     resubmit(,8)
  8. metadata=0x2, priority 50, cookie 0xecab0c71
     set_field:0/0x1000->reg10
     resubmit(,73)
     73. reg14=0x33d,metadata=0x2, priority 80, cookie 0xd191f4de
             set_field:0x1000/0x1000->reg10
     move:NXM_NX_REG10[12]->NXM_NX_XXREG0[111]
      -> NXM_NX_XXREG0[111] is now 0x1
     resubmit(,9)
  9. reg0=0x8000/0x8000,metadata=0x2, priority 50, cookie 0x1a158a65
     drop

Final flow:
udp,reg0=0x8000,reg10=0x1000,reg11=0x2,reg12=0x3,reg13=0x5e,reg14=0x33d,metadata=0x2,in_port=278,vlan_tci=0x0000,dl_src=fa:16:3e:66:0c:6e,dl_dst=ff:ff:ff:ff:ff:ff,nw_src=83.217.210.206,nw_dst=255.255.255.255,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=5678,tp_dst=5678
Megaflow:
recirc_id=0,eth,ip,in_port=278,dl_src=fa:16:3e:66:0c:6e,nw_frag=no
Datapath actions: drop

When i'm seeing this issue on mentioned IPs i can see that VM tries to
resolve default gateway mac address but with no success since ARP is
dropped somewhere because of resubmits and drop action on datapath. I
guess now this is not related to OVN controller. Meanwhile DHCP works
pretty good :)
Glad to hear about DHCP.

Option broadcast-arps-to-all-routers is disabled at both of provider
network Logical Switches.

Could there be any solution for this?
It's not a solution, more of a change to confirm whether this is causing
issues, but could you please try with this commit?

https://github.com/dceara/ovn/commit/fac23e2f6ef6effe3f9a2e0310e78d085750488b

The commit disables flooding of ARP requests that target OVN owned
router IPs to non-router ports.  It's not something that can be accepted
as is because it breaks some other things, e.g., OVN generated GARP
requests will not be forwarded properly.

The patch applies on top of the ovn main branch.  If you want to try it
out on older branches the easiest way is to just ignore the test changes.

Looking forward to hear how it went.

Thanks,
Dumitru

Regards,
Ilia Baikov

26.03.2025 12:34, Ilia Baikov пишет:
Got my hands on this, back to debugging. Seems like kernel runs stable
# uname -r
6.14.0-061400-generic
Meanwhile there is no unrecognized(27) related logs.
tail -f /var/log/kolla/openvswitch/ovn-controller.log | grep -i "dhcp"
2025-03-26T09:23:08.086Z|38050|pinctrl(ovn_pinctrl0)|INFO|DHCPACK
fa:16:3e:9c:f4:45 185.255.178.131
2025-03-26T09:23:08.086Z|38052|pinctrl(ovn_pinctrl0)|DBG|pinctrl
received  packet-in | opcode=PUT_DHCP_OPTS| OF_Table_ID=0|
OF_Cookie_ID=0xfb6fb11d| in-port=5| src-mac=fa:16:3e:9c:f4:45, dst-
mac=ff:ff:ff:ff:ff:ff| src-ip=0.0.0.0, dst-ip=255.255.255.255
2025-03-26T09:23:11.084Z|38054|pinctrl(ovn_pinctrl0)|INFO|DHCPACK
fa:16:3e:9c:f4:45 185.255.178.131
2025-03-26T09:23:11.085Z|38056|pinctrl(ovn_pinctrl0)|DBG|pinctrl
received  packet-in | opcode=PUT_DHCP_OPTS| OF_Table_ID=0|
OF_Cookie_ID=0xfb6fb11d| in-port=5| src-mac=fa:16:3e:9c:f4:45, dst-
mac=ff:ff:ff:ff:ff:ff| src-ip=0.0.0.0, dst-ip=255.255.255.255
2025-03-26T09:23:26.606Z|38058|pinctrl(ovn_pinctrl0)|INFO|DHCPACK
fa:16:3e:9c:f4:45 185.255.178.131
2025-03-26T09:23:26.606Z|38060|pinctrl(ovn_pinctrl0)|DBG|pinctrl
received  packet-in | opcode=PUT_DHCP_OPTS| OF_Table_ID=0|
OF_Cookie_ID=0xfb6fb11d| in-port=5| src-mac=fa:16:3e:9c:f4:45, dst-
mac=ff:ff:ff:ff:ff:ff| src-ip=0.0.0.0, dst-ip=255.255.255.255
2025-03-26T09:23:27.704Z|38062|pinctrl(ovn_pinctrl0)|INFO|DHCPACK
fa:16:3e:9c:f4:45 185.255.178.131
2025-03-26T09:23:27.704Z|38064|pinctrl(ovn_pinctrl0)|DBG|pinctrl
received  packet-in | opcode=PUT_DHCP_OPTS| OF_Table_ID=0|
OF_Cookie_ID=0xfb6fb11d| in-port=5| src-mac=fa:16:3e:9c:f4:45, dst-
mac=ff:ff:ff:ff:ff:ff| src-ip=0.0.0.0, dst-ip=255.255.255.255
2025-03-26T09:23:28.383Z|38066|pinctrl(ovn_pinctrl0)|INFO|DHCPACK
fa:16:3e:9c:f4:45 185.255.178.131
2025-03-26T09:23:28.383Z|38068|pinctrl(ovn_pinctrl0)|DBG|pinctrl
received  packet-in | opcode=PUT_DHCP_OPTS| OF_Table_ID=0|
OF_Cookie_ID=0xfb6fb11d| in-port=5| src-mac=fa:16:3e:9c:f4:45, dst-
mac=ff:ff:ff:ff:ff:ff| src-ip=0.0.0.0, dst-ip=255.255.255.255
2025-03-26T09:23:53.984Z|38070|pinctrl(ovn_pinctrl0)|INFO|DHCPACK
fa:16:3e:50:22:c4 185.255.178.170
2025-03-26T09:23:53.984Z|38072|pinctrl(ovn_pinctrl0)|DBG|pinctrl
received  packet-in | opcode=PUT_DHCP_OPTS| OF_Table_ID=0|
OF_Cookie_ID=0xa020594| in-port=184| src-mac=fa:16:3e:50:22:c4, dst-
mac=30:b6:4f:5f:db:a0| src-ip=185.255.178.170, dst-ip=185.255.178.1
2025-03-26T09:24:51.866Z|38074|pinctrl(ovn_pinctrl0)|INFO|DHCPACK
fa:16:3e:18:ac:4a 89.169.15.224
2025-03-26T09:24:51.866Z|38076|pinctrl(ovn_pinctrl0)|DBG|pinctrl
received  packet-in | opcode=PUT_DHCP_OPTS| OF_Table_ID=0|
OF_Cookie_ID=0xa020594| in-port=156| src-mac=fa:16:3e:18:ac:4a, dst-
mac=30:b6:4f:5f:db:a0| src-ip=89.169.15.224, dst-ip=89.169.15.1

And yes, there is logs about resubmit actions which are expected as
you said.
->reg15
  continuation.actions=unroll_xlate(table=0, cookie=0),resubmit(,32)
  continuation.actions=unroll_xlate(table=0, cookie=0),resubmit(,32)
OFPT_FLOW_MOD (OF1.5) (xid=0x235760): ADD table:8
priority=110,icmp6,reg10=0x10000/0x10000,reg15=0x28a,metadata=0x2,dl_src=fa:16:3e:48:61:ad,icmp_type=2,icmp_code=0
 cookie:0xd1588295 
actions=push:NXM_NX_REG14[],push:NXM_NX_REG15[],pop:NXM_NX_REG14[],pop:NXM_NX_REG15[],resubmit(,9)
OFPT_FLOW_MOD (OF1.5) (xid=0x235761): ADD table:8
priority=110,icmp,reg10=0x10000/0x10000,reg15=0x28a,metadata=0x2,dl_src=fa:16:3e:48:61:ad,icmp_type=3,icmp_code=4
 cookie:0xd1588295 
actions=push:NXM_NX_REG14[],push:NXM_NX_REG15[],pop:NXM_NX_REG14[],pop:NXM_NX_REG15[],resubmit(,9)
OFPT_FLOW_MOD (OF1.5) (xid=0x23577b): ADD table:80
priority=100,reg14=0x28a,metadata=0x2 cookie:0xa17d67d7
actions=set_field:0x9->reg11,set_field:0xa->reg12,resubmit(,8)
OFPT_FLOW_MOD (OF1.5) (xid=0x23577c): ADD table:43
priority=100,reg15=0x28a,metadata=0x2 cookie:0xa17d67d7
actions=set_field:0x1->reg15,resubmit(,43

OFPT_PACKET_OUT (OF1.5) (xid=0x235622): in_port=CONTROLLER
actions=set_field:0x2->metadata,set_field:0x503-
reg14,resubmit(CONTROLLER,8) data_len=42
OFPT_PACKET_OUT (OF1.5) (xid=0x235622): in_port=CONTROLLER
actions=set_field:0x2->metadata,set_field:0x503-
reg14,resubmit(CONTROLLER,8) data_len=42
  continuation.actions=unroll_xlate(table=0, cookie=0),resubmit(,32)
  continuation.actions=unroll_xlate(table=0, cookie=0),resubmit(,32)
  continuation.actions=unroll_xlate(table=0, cookie=0),resubmit(,32)
  continuation.actions=unroll_xlate(table=0, cookie=0),resubmit(,32)

top - 09:29:33 up 16:58,  3 users,  load average: 37.70, 37.39, 36.95
Threads: 114 total,  13 running, 101 sleeping,   0 stopped,   0 zombie
%Cpu(s): 27.5 us, 11.2 sy,  0.0 ni, 60.3 id,  0.5 wa,  0.0 hi, 0.6
si,  0.0 st
MiB Mem : 773901.8 total, 323685.3 free, 423865.0 used,  26351.4 buff/
cache
MiB Swap:   8192.0 total,   8192.0 free,      0.0 used. 345358.7 avail
Mem

#top -H -p $(pidof ovs-vswitchd)
     PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM TIME+
COMMAND
    6212 root      20   0 8388104 662016   7940 R  22.3   0.1 32:13.87
revalidator102
    6200 root      20   0 8388104 662016   7940 R  21.6   0.1 73:01.76
revalidator90
    6202 root      20   0 8388104 662016   7940 R  20.6   0.1 32:13.77
revalidator92
    6217 root      20   0 8388104 662016   7940 R  20.3   0.1 26:07.50
revalidator107
    6214 root      20   0 8388104 662016   7940 R  18.9   0.1 39:12.60
revalidator104
    6219 root      20   0 8388104 662016   7940 R  18.6   0.1 25:57.96
revalidator109
    6211 root      20   0 8388104 662016   7940 R  17.6   0.1 39:20.20
revalidator101
    6207 root      20   0 8388104 662016   7940 R  13.6   0.1 17:53.31
revalidator97
    6204 root      20   0 8388104 662016   7940 R  13.0   0.1 32:34.67
revalidator94
    6209 root      20   0 8388104 662016   7940 R  12.6   0.1 35:55.88
revalidator100
    6213 root      20   0 8388104 662016   7940 R  12.6   0.1 18:04.43
revalidator103
    6220 root      20   0 8388104 662016   7940 R  12.3   0.1 5:52.71
revalidator110
    6218 root      20   0 8388104 662016   7940 R  12.0   0.1 9:09.66
revalidator108
    6215 root      20   0 8388104 662016   7940 S   8.6   0.1 22:22.27
revalidator105
    6221 root      20   0 8388104 662016   7940 S   8.6   0.1 4:44.93
revalidator111
    6208 root      20   0 8388104 662016   7940 S   8.0   0.1 22:55.17
revalidator98
    6203 root      20   0 8388104 662016   7940 S   6.0   0.1 38:52.72
revalidator93
    6206 root      20   0 8388104 662016   7940 S   6.0   0.1 22:11.55
revalidator96
    6216 root      20   0 8388104 662016   7940 S   4.3   0.1 33:14.52
revalidator106

Sometimes revalidator processes drops to 5-10%, sometimes to 30%. I
guess this behaviour is because of resubmit actions?. So far so good
controller feeling fine, but there could be some sort of freezes on
sending DHCPREPLY.

top -H -p $(pidof ovn-controller)
top - 09:30:55 up 16:59,  3 users,  load average: 35.73, 36.70, 36.73
Threads:   5 total,   0 running,   5 sleeping,   0 stopped,   0 zombie
%Cpu(s): 27.7 us, 10.6 sy,  0.0 ni, 60.8 id,  0.3 wa,  0.0 hi, 0.6
si,  0.0 st
MiB Mem : 773901.8 total, 323758.4 free, 423785.6 used,  26357.8 buff/
cache
MiB Swap:   8192.0 total,   8192.0 free,      0.0 used. 345438.0 avail
Mem

     PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM TIME+
COMMAND
    5045 root      20   0  396500  83836   4432 S   0.7   0.0 11:46.59
ovn-controller
    5412 root      20   0  396500  83836   4432 S   0.0   0.0 0:06.23
ovn_pinctrl0
    5413 root      20   0  396500  83836   4432 S   0.0   0.0 0:00.00
urcu1
    5414 root      20   0  396500  83836   4432 S   0.0   0.0 0:00.20
ovn_statctrl2
    6103 root      20   0  396500  83836   4432 S   0.0   0.0 0:04.10
stopwatch3


Trying to reproduce on a real-world environment. There is 300
instances running with about 300Mbps network traffic in total.
Is there more logs or debug i can provide?

--
Regards,

Ilia Baikov
ilia.baikov@ib.systems

_______________________________________________
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to