Hello,
I've compiled OVN using provided branch with both patches included and migrated some VMs to host with L3 networking to see what going to happen.

Resubmit logs are now back. Interesting thing that there was about 270 VMs before so logs appears back on about 300 instances (as well as ports). There was no resubmit logs from 1 of April. So count of ports related to this case.

2025-04-06T18:09:36.797Z|00128|ofproto_dpif_xlate(handler20)|WARN|over 4096 resubmit actions on bridge br-int while processing udp,in_port=683,vlan_tci=0x0000,dl_src=fa:16:3e:27:0d:b2,dl_dst=01:00:5e:7f:ff:fa,nw_src=89.169.15.21,nw_dst=239.255.255.250,nw_tos=184,nw_ecn=0,nw_ttl=1,nw_frag=no,tp_src=56512,tp_dst=1900 2025-04-06T18:09:46.800Z|00123|ofproto_dpif_xlate(handler56)|WARN|over 4096 resubmit actions on bridge br-int while processing udp,in_port=683,vlan_tci=0x0000,dl_src=fa:16:3e:27:0d:b2,dl_dst=01:00:5e:7f:ff:fa,nw_src=89.169.15.21,nw_dst=239.255.255.250,nw_tos=184,nw_ecn=0,nw_ttl=1,nw_frag=no,tp_src=56512,tp_dst=1900 2025-04-06T18:09:46.808Z|00139|ofproto_dpif_xlate(handler64)|WARN|over 4096 resubmit actions on bridge br-int while processing udp,in_port=541,vlan_tci=0x0000,dl_src=fa:16:3e:57:59:7d,dl_dst=01:00:5e:7f:ff:fa,nw_src=89.169.15.68,nw_dst=239.255.255.250,nw_tos=0,nw_ecn=0,nw_ttl=2,nw_frag=no,tp_src=55936,tp_dst=1900 2025-04-06T18:09:51.225Z|00020|ofproto_dpif_xlate(handler49)|WARN|over 4096 resubmit actions on bridge br-int while processing udp,in_port=685,vlan_tci=0x0000,dl_src=fa:16:3e:4e:86:32,dl_dst=ff:ff:ff:ff:ff:ff,nw_src=89.169.15.31,nw_dst=89.169.15.255,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=138,tp_dst=138 2025-04-06T18:09:51.229Z|00021|ofproto_dpif_xlate(handler49)|WARN|over 4096 resubmit actions on bridge br-int while processing udp,in_port=685,vlan_tci=0x0000,dl_src=fa:16:3e:4e:86:32,dl_dst=ff:ff:ff:ff:ff:ff,nw_src=89.169.15.31,nw_dst=89.169.15.255,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=138,tp_dst=138 2025-04-06T18:10:39.616Z|2159336|ofproto_dpif_xlate|WARN|over 4096 resubmit actions on bridge br-int while processing arp,in_port=CONTROLLER,vlan_tci=0x0000,dl_src=fa:16:3e:63:3e:92,dl_dst=ff:ff:ff:ff:ff:ff,arp_spa=185.255.178.116,arp_tpa=185.255.178.116,arp_op=1,arp_sha=fa:16:3e:63:3e:92,arp_tha=00:00:00:00:00:00 2025-04-06T18:33:24.414Z|00125|ofproto_dpif_xlate(handler62)|WARN|over 4096 resubmit actions on bridge br-int while processing udp,in_port=473,vlan_tci=0x0000,dl_src=fa:16:3e:66:0c:6e,dl_dst=ff:ff:ff:ff:ff:ff,nw_src=83.217.210.206,nw_dst=255.255.255.255,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=5678,tp_dst=5678 2025-04-06T18:33:54.406Z|00102|ofproto_dpif_xlate(handler26)|WARN|over 4096 resubmit actions on bridge br-int while processing udp,in_port=473,vlan_tci=0x0000,dl_src=fa:16:3e:66:0c:6e,dl_dst=ff:ff:ff:ff:ff:ff,nw_src=83.217.210.206,nw_dst=255.255.255.255,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=5678,tp_dst=5678 2025-04-06T18:34:00.917Z|00088|ofproto_dpif_xlate(handler48)|WARN|over 4096 resubmit actions on bridge br-int while processing udp,in_port=685,vlan_tci=0x0000,dl_src=fa:16:3e:4e:86:32,dl_dst=ff:ff:ff:ff:ff:ff,nw_src=89.169.15.31,nw_dst=89.169.15.255,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=138,tp_dst=138 2025-04-06T18:34:00.918Z|00089|ofproto_dpif_upcall(handler48)|WARN|Dropped 2 log messages in last 37 seconds (most recently, 7 seconds ago) due to excessive rate 2025-04-06T18:34:00.918Z|00090|ofproto_dpif_upcall(handler48)|WARN|Flow: udp,in_port=46,vlan_tci=0x0000,dl_src=fa:16:3e:4e:86:32,dl_dst=ff:ff:ff:ff:ff:ff,nw_src=89.169.15.31,nw_dst=89.169.15.255,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=138,tp_dst=138 2025-04-06T18:34:00.920Z|00091|ofproto_dpif_xlate(handler48)|WARN|over 4096 resubmit actions on bridge br-int while processing udp,in_port=685,vlan_tci=0x0000,dl_src=fa:16:3e:4e:86:32,dl_dst=ff:ff:ff:ff:ff:ff,nw_src=89.169.15.31,nw_dst=89.169.15.255,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=138,tp_dst=138 2025-04-06T18:34:24.407Z|00127|ofproto_dpif_xlate(handler7)|WARN|over 4096 resubmit actions on bridge br-int while processing udp,in_port=473,vlan_tci=0x0000,dl_src=fa:16:3e:66:0c:6e,dl_dst=ff:ff:ff:ff:ff:ff,nw_src=83.217.210.206,nw_dst=255.255.255.255,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=5678,tp_dst=5678 2025-04-06T18:35:11.952Z|00252|ofproto_dpif_xlate(handler38)|WARN|Dropped 1 log messages in last 18 seconds (most recently, 18 seconds ago) due to excessive rate 2025-04-06T18:35:11.952Z|00253|ofproto_dpif_xlate(handler38)|WARN|over 4096 resubmit actions on bridge br-int while processing udp,in_port=541,vlan_tci=0x0000,dl_src=fa:16:3e:57:59:7d,dl_dst=01:00:5e:7f:ff:fa,nw_src=89.169.15.68,nw_dst=239.255.255.250,nw_tos=0,nw_ecn=0,nw_ttl=2,nw_frag=no,tp_src=60210,tp_dst=1900 2025-04-06T18:35:11.952Z|00254|ofproto_dpif_upcall(handler38)|WARN|Dropped 3 log messages in last 71 seconds (most recently, 18 seconds ago) due to excessive rate 2025-04-06T18:35:11.953Z|00255|ofproto_dpif_upcall(handler38)|WARN|Flow: udp,in_port=346,vlan_tci=0x0000,dl_src=fa:16:3e:57:59:7d,dl_dst=01:00:5e:7f:ff:fa,nw_src=89.169.15.68,nw_dst=239.255.255.250,nw_tos=0,nw_ecn=0,nw_ttl=2,nw_frag=no,tp_src=60210,tp_dst=1900

# ovn-controller --version (not using
ovn-controller 24.09.3
Open vSwitch Library 3.4.2
OpenFlow versions 0x6:0x6
SB DB Schema 20.37.0

# ovn-northd --version
ovn-northd 24.09.3
Open vSwitch Library 3.4.2

On 02.04.2025 10:52, Dumitru Ceara wrote:
On 4/1/25 7:30 PM, Ilia Baikov wrote:
Hi Dumitru,
Hi Ilia,

Sure, let's give it a try. Is it good idea to apply this patch on top of
the patch you previously sent to try?
Yes, you're right, we should apply it on top of that.  I pushed both
patches here:

https://github.com/dceara/ovn/commits/refs/heads/tmp-branch-24.09-revert-mg-split/

Please ignore the CI failures, like mentioned yesterday the last commit
I did is known to break stuff, it's just to confirm that our problem in
this case is due to the MC_FLOOD_L2 output we do for ARP requests
targetting OVN router IPs.

What topology for testing is more preferrable to try it out? L2 or L3
(bgp-based networking)? In case of L3 there is actually small amount of
ARP pps compared to L2 where uplink interface (like eno1) is member of
br-ex bridge.

Would it be possible to try both?

Thank you for your involvement, really appreciate it!

No problem, thanks for taking time to help us try things out!

Regards,
Dumitru

01.04.2025 15:06, Dumitru Ceara wrote:
On 4/1/25 3:15 AM, Ilia Baikov wrote:
Hello,
Hi Ilia,

So the things go way deeper and it becomes way strange as i initially
thought.
I've migrated to L3 networking using ovn-bgp-agent, in order to reduce
ARP packets flooded over all ports attached to br-int. However this
didn't help at all and some VMs loses external connectivity (but VMs
ports didn't get flooded by hundreds pps of ARP).

2025-04-01T01:03:51.368Z|00035|ofproto_dpif_xlate(handler12)|WARN|
Dropped 131 log messages in last 55 seconds (most recently, 10 seconds
ago) due to excessive rate
2025-04-01T01:03:51.368Z|00036|ofproto_dpif_xlate(handler12)|WARN|over
4096 resubmit actions on bridge br-int while processing
udp,in_port=473,vlan_tci=0x0000,dl_src=fa:16:3e:66:0c:6e,dl_dst=ff:ff:ff:ff:ff:ff,nw_src=83.217.210.206,nw_dst=255.255.255.255,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=5678,tp_dst=5678
2025-04-01T01:04:35.392Z|00054|ofproto_dpif_upcall(handler60)|WARN|
Dropped 133 log messages in last 57 seconds (most recently, 13 seconds
ago) due to excessive rate
2025-04-01T01:04:35.392Z|00055|ofproto_dpif_upcall(handler60)|WARN|Flow:
udp,in_port=277,vlan_tci=0x0000,dl_src=fa:16:3e:24:f1:f7,dl_dst=ff:ff:ff:ff:ff:ff,nw_src=83.217.210.148,nw_dst=83.217.210.255,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=137,tp_dst=137

bridge("br-int")
----------------
   0. in_port=277, priority 100, cookie 0xa70f8aad
      set_field:0x5d/0xffff->reg13
      set_field:0x2->reg11
      set_field:0x3->reg12
      set_field:0x2->metadata
      set_field:0x2ac->reg14
      set_field:0/0xffff0000->reg13
      resubmit(,8)
   8. metadata=0x2, priority 50, cookie 0xecab0c71
      set_field:0/0x1000->reg10
      resubmit(,73)
      73. reg14=0x2ac,metadata=0x2, priority 80, cookie 0xa70f8aad
              set_field:0x1000/0x1000->reg10
      move:NXM_NX_REG10[12]->NXM_NX_XXREG0[111]
       -> NXM_NX_XXREG0[111] is now 0x1
      resubmit(,9)
   9. reg0=0x8000/0x8000,metadata=0x2, priority 50, cookie 0x1a158a65
      drop

Final flow:
udp,reg0=0x8000,reg10=0x1000,reg11=0x2,reg12=0x3,reg13=0x5d,reg14=0x2ac,metadata=0x2,in_port=277,vlan_tci=0x0000,dl_src=fa:16:3e:24:f1:f7,dl_dst=ff:ff:ff:ff:ff:ff,nw_src=83.217.210.148,nw_dst=83.217.210.255,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=137,tp_dst=137
Megaflow:
recirc_id=0,eth,ip,in_port=277,dl_src=fa:16:3e:24:f1:f7,nw_frag=no
Datapath actions: drop
2025-04-01T01:04:48.434Z|00015|ofproto_dpif_xlate(handler58)|WARN|
Dropped 7 log messages in last 55 seconds (most recently, 13 seconds
ago) due to excessive rate
2025-04-01T01:04:48.435Z|00016|ofproto_dpif_xlate(handler58)|WARN|over
4096 resubmit actions on bridge br-int while processing
arp,in_port=487,vlan_tci=0x0000,dl_src=fa:16:3e:de:2a:ce,dl_dst=ff:ff:ff:ff:ff:ff,arp_spa=185.255.178.119,arp_tpa=185.255.178.1,arp_op=1,arp_sha=fa:16:3e:de:2a:ce,arp_tha=00:00:00:00:00:00
2025-04-01T01:05:51.377Z|00017|ofproto_dpif_xlate(handler46)|WARN|
Dropped 7 log messages in last 61 seconds (most recently, 29 seconds
ago) due to excessive rate
2025-04-01T01:05:51.377Z|00018|ofproto_dpif_xlate(handler46)|WARN|over
4096 resubmit actions on bridge br-int while processing
udp,in_port=473,vlan_tci=0x0000,dl_src=fa:16:3e:66:0c:6e,dl_dst=ff:ff:ff:ff:ff:ff,nw_src=83.217.210.206,nw_dst=255.255.255.255,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=5678,tp_dst=5678
2025-04-01T01:05:51.377Z|00019|ofproto_dpif_upcall(handler46)|WARN|
Dropped 8 log messages in last 63 seconds (most recently, 29 seconds
ago) due to excessive rate
2025-04-01T01:05:51.377Z|00020|ofproto_dpif_upcall(handler46)|WARN|Flow:
udp,in_port=278,vlan_tci=0x0000,dl_src=fa:16:3e:66:0c:6e,dl_dst=ff:ff:ff:ff:ff:ff,nw_src=83.217.210.206,nw_dst=255.255.255.255,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=5678,tp_dst=5678

bridge("br-int")
----------------
   0. in_port=278, priority 100, cookie 0xd191f4de
      set_field:0x5e/0xffff->reg13
      set_field:0x2->reg11
      set_field:0x3->reg12
      set_field:0x2->metadata
      set_field:0x33d->reg14
      set_field:0/0xffff0000->reg13
      resubmit(,8)
   8. metadata=0x2, priority 50, cookie 0xecab0c71
      set_field:0/0x1000->reg10
      resubmit(,73)
      73. reg14=0x33d,metadata=0x2, priority 80, cookie 0xd191f4de
              set_field:0x1000/0x1000->reg10
      move:NXM_NX_REG10[12]->NXM_NX_XXREG0[111]
       -> NXM_NX_XXREG0[111] is now 0x1
      resubmit(,9)
   9. reg0=0x8000/0x8000,metadata=0x2, priority 50, cookie 0x1a158a65
      drop

Final flow:
udp,reg0=0x8000,reg10=0x1000,reg11=0x2,reg12=0x3,reg13=0x5e,reg14=0x33d,metadata=0x2,in_port=278,vlan_tci=0x0000,dl_src=fa:16:3e:66:0c:6e,dl_dst=ff:ff:ff:ff:ff:ff,nw_src=83.217.210.206,nw_dst=255.255.255.255,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=5678,tp_dst=5678
Megaflow:
recirc_id=0,eth,ip,in_port=278,dl_src=fa:16:3e:66:0c:6e,nw_frag=no
Datapath actions: drop

When i'm seeing this issue on mentioned IPs i can see that VM tries to
resolve default gateway mac address but with no success since ARP is
dropped somewhere because of resubmits and drop action on datapath. I
guess now this is not related to OVN controller. Meanwhile DHCP works
pretty good :)
Glad to hear about DHCP.

Option broadcast-arps-to-all-routers is disabled at both of provider
network Logical Switches.

Could there be any solution for this?
It's not a solution, more of a change to confirm whether this is causing
issues, but could you please try with this commit?

https://github.com/dceara/ovn/commit/
fac23e2f6ef6effe3f9a2e0310e78d085750488b

The commit disables flooding of ARP requests that target OVN owned
router IPs to non-router ports.  It's not something that can be accepted
as is because it breaks some other things, e.g., OVN generated GARP
requests will not be forwarded properly.

The patch applies on top of the ovn main branch.  If you want to try it
out on older branches the easiest way is to just ignore the test changes.

Looking forward to hear how it went.

Thanks,
Dumitru

Regards,
Ilia Baikov

26.03.2025 12:34, Ilia Baikov пишет:
Got my hands on this, back to debugging. Seems like kernel runs stable
# uname -r
6.14.0-061400-generic
Meanwhile there is no unrecognized(27) related logs.
tail -f /var/log/kolla/openvswitch/ovn-controller.log | grep -i "dhcp"
2025-03-26T09:23:08.086Z|38050|pinctrl(ovn_pinctrl0)|INFO|DHCPACK
fa:16:3e:9c:f4:45 185.255.178.131
2025-03-26T09:23:08.086Z|38052|pinctrl(ovn_pinctrl0)|DBG|pinctrl
received  packet-in | opcode=PUT_DHCP_OPTS| OF_Table_ID=0|
OF_Cookie_ID=0xfb6fb11d| in-port=5| src-mac=fa:16:3e:9c:f4:45, dst-
mac=ff:ff:ff:ff:ff:ff| src-ip=0.0.0.0, dst-ip=255.255.255.255
2025-03-26T09:23:11.084Z|38054|pinctrl(ovn_pinctrl0)|INFO|DHCPACK
fa:16:3e:9c:f4:45 185.255.178.131
2025-03-26T09:23:11.085Z|38056|pinctrl(ovn_pinctrl0)|DBG|pinctrl
received  packet-in | opcode=PUT_DHCP_OPTS| OF_Table_ID=0|
OF_Cookie_ID=0xfb6fb11d| in-port=5| src-mac=fa:16:3e:9c:f4:45, dst-
mac=ff:ff:ff:ff:ff:ff| src-ip=0.0.0.0, dst-ip=255.255.255.255
2025-03-26T09:23:26.606Z|38058|pinctrl(ovn_pinctrl0)|INFO|DHCPACK
fa:16:3e:9c:f4:45 185.255.178.131
2025-03-26T09:23:26.606Z|38060|pinctrl(ovn_pinctrl0)|DBG|pinctrl
received  packet-in | opcode=PUT_DHCP_OPTS| OF_Table_ID=0|
OF_Cookie_ID=0xfb6fb11d| in-port=5| src-mac=fa:16:3e:9c:f4:45, dst-
mac=ff:ff:ff:ff:ff:ff| src-ip=0.0.0.0, dst-ip=255.255.255.255
2025-03-26T09:23:27.704Z|38062|pinctrl(ovn_pinctrl0)|INFO|DHCPACK
fa:16:3e:9c:f4:45 185.255.178.131
2025-03-26T09:23:27.704Z|38064|pinctrl(ovn_pinctrl0)|DBG|pinctrl
received  packet-in | opcode=PUT_DHCP_OPTS| OF_Table_ID=0|
OF_Cookie_ID=0xfb6fb11d| in-port=5| src-mac=fa:16:3e:9c:f4:45, dst-
mac=ff:ff:ff:ff:ff:ff| src-ip=0.0.0.0, dst-ip=255.255.255.255
2025-03-26T09:23:28.383Z|38066|pinctrl(ovn_pinctrl0)|INFO|DHCPACK
fa:16:3e:9c:f4:45 185.255.178.131
2025-03-26T09:23:28.383Z|38068|pinctrl(ovn_pinctrl0)|DBG|pinctrl
received  packet-in | opcode=PUT_DHCP_OPTS| OF_Table_ID=0|
OF_Cookie_ID=0xfb6fb11d| in-port=5| src-mac=fa:16:3e:9c:f4:45, dst-
mac=ff:ff:ff:ff:ff:ff| src-ip=0.0.0.0, dst-ip=255.255.255.255
2025-03-26T09:23:53.984Z|38070|pinctrl(ovn_pinctrl0)|INFO|DHCPACK
fa:16:3e:50:22:c4 185.255.178.170
2025-03-26T09:23:53.984Z|38072|pinctrl(ovn_pinctrl0)|DBG|pinctrl
received  packet-in | opcode=PUT_DHCP_OPTS| OF_Table_ID=0|
OF_Cookie_ID=0xa020594| in-port=184| src-mac=fa:16:3e:50:22:c4, dst-
mac=30:b6:4f:5f:db:a0| src-ip=185.255.178.170, dst-ip=185.255.178.1
2025-03-26T09:24:51.866Z|38074|pinctrl(ovn_pinctrl0)|INFO|DHCPACK
fa:16:3e:18:ac:4a 89.169.15.224
2025-03-26T09:24:51.866Z|38076|pinctrl(ovn_pinctrl0)|DBG|pinctrl
received  packet-in | opcode=PUT_DHCP_OPTS| OF_Table_ID=0|
OF_Cookie_ID=0xa020594| in-port=156| src-mac=fa:16:3e:18:ac:4a, dst-
mac=30:b6:4f:5f:db:a0| src-ip=89.169.15.224, dst-ip=89.169.15.1

And yes, there is logs about resubmit actions which are expected as
you said.
->reg15
   continuation.actions=unroll_xlate(table=0, cookie=0),resubmit(,32)
   continuation.actions=unroll_xlate(table=0, cookie=0),resubmit(,32)
OFPT_FLOW_MOD (OF1.5) (xid=0x235760): ADD table:8
priority=110,icmp6,reg10=0x10000/0x10000,reg15=0x28a,metadata=0x2,dl_src=fa:16:3e:48:61:ad,icmp_type=2,icmp_code=0
 cookie:0xd1588295 
actions=push:NXM_NX_REG14[],push:NXM_NX_REG15[],pop:NXM_NX_REG14[],pop:NXM_NX_REG15[],resubmit(,9)
OFPT_FLOW_MOD (OF1.5) (xid=0x235761): ADD table:8
priority=110,icmp,reg10=0x10000/0x10000,reg15=0x28a,metadata=0x2,dl_src=fa:16:3e:48:61:ad,icmp_type=3,icmp_code=4
 cookie:0xd1588295 
actions=push:NXM_NX_REG14[],push:NXM_NX_REG15[],pop:NXM_NX_REG14[],pop:NXM_NX_REG15[],resubmit(,9)
OFPT_FLOW_MOD (OF1.5) (xid=0x23577b): ADD table:80
priority=100,reg14=0x28a,metadata=0x2 cookie:0xa17d67d7
actions=set_field:0x9->reg11,set_field:0xa->reg12,resubmit(,8)
OFPT_FLOW_MOD (OF1.5) (xid=0x23577c): ADD table:43
priority=100,reg15=0x28a,metadata=0x2 cookie:0xa17d67d7
actions=set_field:0x1->reg15,resubmit(,43

OFPT_PACKET_OUT (OF1.5) (xid=0x235622): in_port=CONTROLLER
actions=set_field:0x2->metadata,set_field:0x503-
reg14,resubmit(CONTROLLER,8) data_len=42
OFPT_PACKET_OUT (OF1.5) (xid=0x235622): in_port=CONTROLLER
actions=set_field:0x2->metadata,set_field:0x503-
reg14,resubmit(CONTROLLER,8) data_len=42
   continuation.actions=unroll_xlate(table=0, cookie=0),resubmit(,32)
   continuation.actions=unroll_xlate(table=0, cookie=0),resubmit(,32)
   continuation.actions=unroll_xlate(table=0, cookie=0),resubmit(,32)
   continuation.actions=unroll_xlate(table=0, cookie=0),resubmit(,32)

top - 09:29:33 up 16:58,  3 users,  load average: 37.70, 37.39, 36.95
Threads: 114 total,  13 running, 101 sleeping,   0 stopped,   0 zombie
%Cpu(s): 27.5 us, 11.2 sy,  0.0 ni, 60.3 id,  0.5 wa,  0.0 hi, 0.6
si,  0.0 st
MiB Mem : 773901.8 total, 323685.3 free, 423865.0 used,  26351.4 buff/
cache
MiB Swap:   8192.0 total,   8192.0 free,      0.0 used. 345358.7 avail
Mem

#top -H -p $(pidof ovs-vswitchd)
      PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM TIME+
COMMAND
     6212 root      20   0 8388104 662016   7940 R  22.3   0.1 32:13.87
revalidator102
     6200 root      20   0 8388104 662016   7940 R  21.6   0.1 73:01.76
revalidator90
     6202 root      20   0 8388104 662016   7940 R  20.6   0.1 32:13.77
revalidator92
     6217 root      20   0 8388104 662016   7940 R  20.3   0.1 26:07.50
revalidator107
     6214 root      20   0 8388104 662016   7940 R  18.9   0.1 39:12.60
revalidator104
     6219 root      20   0 8388104 662016   7940 R  18.6   0.1 25:57.96
revalidator109
     6211 root      20   0 8388104 662016   7940 R  17.6   0.1 39:20.20
revalidator101
     6207 root      20   0 8388104 662016   7940 R  13.6   0.1 17:53.31
revalidator97
     6204 root      20   0 8388104 662016   7940 R  13.0   0.1 32:34.67
revalidator94
     6209 root      20   0 8388104 662016   7940 R  12.6   0.1 35:55.88
revalidator100
     6213 root      20   0 8388104 662016   7940 R  12.6   0.1 18:04.43
revalidator103
     6220 root      20   0 8388104 662016   7940 R  12.3   0.1 5:52.71
revalidator110
     6218 root      20   0 8388104 662016   7940 R  12.0   0.1 9:09.66
revalidator108
     6215 root      20   0 8388104 662016   7940 S   8.6   0.1 22:22.27
revalidator105
     6221 root      20   0 8388104 662016   7940 S   8.6   0.1 4:44.93
revalidator111
     6208 root      20   0 8388104 662016   7940 S   8.0   0.1 22:55.17
revalidator98
     6203 root      20   0 8388104 662016   7940 S   6.0   0.1 38:52.72
revalidator93
     6206 root      20   0 8388104 662016   7940 S   6.0   0.1 22:11.55
revalidator96
     6216 root      20   0 8388104 662016   7940 S   4.3   0.1 33:14.52
revalidator106

Sometimes revalidator processes drops to 5-10%, sometimes to 30%. I
guess this behaviour is because of resubmit actions?. So far so good
controller feeling fine, but there could be some sort of freezes on
sending DHCPREPLY.

top -H -p $(pidof ovn-controller)
top - 09:30:55 up 16:59,  3 users,  load average: 35.73, 36.70, 36.73
Threads:   5 total,   0 running,   5 sleeping,   0 stopped,   0 zombie
%Cpu(s): 27.7 us, 10.6 sy,  0.0 ni, 60.8 id,  0.3 wa,  0.0 hi, 0.6
si,  0.0 st
MiB Mem : 773901.8 total, 323758.4 free, 423785.6 used,  26357.8 buff/
cache
MiB Swap:   8192.0 total,   8192.0 free,      0.0 used. 345438.0 avail
Mem

      PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM TIME+
COMMAND
     5045 root      20   0  396500  83836   4432 S   0.7   0.0 11:46.59
ovn-controller
     5412 root      20   0  396500  83836   4432 S   0.0   0.0 0:06.23
ovn_pinctrl0
     5413 root      20   0  396500  83836   4432 S   0.0   0.0 0:00.00
urcu1
     5414 root      20   0  396500  83836   4432 S   0.0   0.0 0:00.20
ovn_statctrl2
     6103 root      20   0  396500  83836   4432 S   0.0   0.0 0:04.10
stopwatch3


Trying to reproduce on a real-world environment. There is 300
instances running with about 300Mbps network traffic in total.
Is there more logs or debug i can provide?

--
Regards,

Ilia Baikov
ilia.baikov@ib.systems

_______________________________________________
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to