Hi Dumitru,

Out of these logs, the UDP ones are for IP multicast/broadcast traffic,
those will still get flooded and potentially hit the resubmit limit -
none of the changes we tried tackle that.  I also don't think that
should cause much issues (aside from the vswitchd logs) though.

The only other packet I see logged there is:

2025-04-06T18:10:39.616Z|2159336|ofproto_dpif_xlate|WARN|over 4096
resubmit actions on bridge br-int while processing
arp,in_port=CONTROLLER,vlan_tci=0x0000,dl_src=fa:16:3e:63:3e:92,dl_dst=ff:ff:ff:ff:ff:ff,arp_spa=185.255.178.116,arp_tpa=185.255.178.116,arp_op=1,arp_sha=fa:16:3e:63:3e:92,arp_tha=00:00:00:00:00:00
2025-04-06T18:33:24.414Z|00125|ofproto_dpif_xlate(handler62)

This is a GARP (originated by ovn-controller) and in theory should not
be flooded with the change from
https://github.com/dceara/ovn/commits/refs/heads/tmp-branch-24.09-revert-mg-split/.

Just to double check, have you also upgraded ovn-northd to the patched
version?  That's required because the changes in the last commit on my
branch are for ovn-northd.

Yes, i've uploaded compiled ovn-controller as well as ovn-northd using your branch to containers and then restarted both containers. I will get my hands on this today again. Should I clear ovs database after updating binaries in containers before restarting them for recomputing? While i seeing resubmit messages and datapath drop actions, ovn-controller replies with DHCPOFFER/DHCPACK.
Could logs like follow mean something important in this case?

Final flow:
udp,reg0=0x8000,reg10=0x1000,reg11=0x2,reg12=0x3,reg13=0x5d,reg14=0x2ac,metadata=0x2,in_port=277,vlan_tci=0x0000,dl_src=fa:16:3e:24:f1:f7,dl_dst=ff:ff:ff:ff:ff:ff,nw_src=83.217.210.148,nw_dst=83.217.210.255,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=137,tp_dst=137
Megaflow:
recirc_id=0,eth,ip,in_port=277,dl_src=fa:16:3e:24:f1:f7,nw_frag=no
Datapath actions: drop

On 08.04.2025 13:24, Dumitru Ceara wrote:
On 4/6/25 8:37 PM, Ilia Baikov wrote:
Hello,
Hi Ilia,

I've compiled OVN using provided branch with both patches included and
migrated some VMs to host with L3 networking to see what going to happen.

Resubmit logs are now back. Interesting thing that there was about 270
VMs before so logs appears back on about 300 instances (as well as
ports). There was no resubmit logs from 1 of April. So count of ports
related to this case.

2025-04-06T18:09:36.797Z|00128|ofproto_dpif_xlate(handler20)|WARN|over
4096 resubmit actions on bridge br-int while processing
udp,in_port=683,vlan_tci=0x0000,dl_src=fa:16:3e:27:0d:b2,dl_dst=01:00:5e:7f:ff:fa,nw_src=89.169.15.21,nw_dst=239.255.255.250,nw_tos=184,nw_ecn=0,nw_ttl=1,nw_frag=no,tp_src=56512,tp_dst=1900
2025-04-06T18:09:46.800Z|00123|ofproto_dpif_xlate(handler56)|WARN|over
4096 resubmit actions on bridge br-int while processing
udp,in_port=683,vlan_tci=0x0000,dl_src=fa:16:3e:27:0d:b2,dl_dst=01:00:5e:7f:ff:fa,nw_src=89.169.15.21,nw_dst=239.255.255.250,nw_tos=184,nw_ecn=0,nw_ttl=1,nw_frag=no,tp_src=56512,tp_dst=1900
2025-04-06T18:09:46.808Z|00139|ofproto_dpif_xlate(handler64)|WARN|over
4096 resubmit actions on bridge br-int while processing
udp,in_port=541,vlan_tci=0x0000,dl_src=fa:16:3e:57:59:7d,dl_dst=01:00:5e:7f:ff:fa,nw_src=89.169.15.68,nw_dst=239.255.255.250,nw_tos=0,nw_ecn=0,nw_ttl=2,nw_frag=no,tp_src=55936,tp_dst=1900
2025-04-06T18:09:51.225Z|00020|ofproto_dpif_xlate(handler49)|WARN|over
4096 resubmit actions on bridge br-int while processing
udp,in_port=685,vlan_tci=0x0000,dl_src=fa:16:3e:4e:86:32,dl_dst=ff:ff:ff:ff:ff:ff,nw_src=89.169.15.31,nw_dst=89.169.15.255,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=138,tp_dst=138
2025-04-06T18:09:51.229Z|00021|ofproto_dpif_xlate(handler49)|WARN|over
4096 resubmit actions on bridge br-int while processing
udp,in_port=685,vlan_tci=0x0000,dl_src=fa:16:3e:4e:86:32,dl_dst=ff:ff:ff:ff:ff:ff,nw_src=89.169.15.31,nw_dst=89.169.15.255,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=138,tp_dst=138
2025-04-06T18:10:39.616Z|2159336|ofproto_dpif_xlate|WARN|over 4096
resubmit actions on bridge br-int while processing
arp,in_port=CONTROLLER,vlan_tci=0x0000,dl_src=fa:16:3e:63:3e:92,dl_dst=ff:ff:ff:ff:ff:ff,arp_spa=185.255.178.116,arp_tpa=185.255.178.116,arp_op=1,arp_sha=fa:16:3e:63:3e:92,arp_tha=00:00:00:00:00:00
2025-04-06T18:33:24.414Z|00125|ofproto_dpif_xlate(handler62)|WARN|over
4096 resubmit actions on bridge br-int while processing
udp,in_port=473,vlan_tci=0x0000,dl_src=fa:16:3e:66:0c:6e,dl_dst=ff:ff:ff:ff:ff:ff,nw_src=83.217.210.206,nw_dst=255.255.255.255,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=5678,tp_dst=5678
2025-04-06T18:33:54.406Z|00102|ofproto_dpif_xlate(handler26)|WARN|over
4096 resubmit actions on bridge br-int while processing
udp,in_port=473,vlan_tci=0x0000,dl_src=fa:16:3e:66:0c:6e,dl_dst=ff:ff:ff:ff:ff:ff,nw_src=83.217.210.206,nw_dst=255.255.255.255,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=5678,tp_dst=5678
2025-04-06T18:34:00.917Z|00088|ofproto_dpif_xlate(handler48)|WARN|over
4096 resubmit actions on bridge br-int while processing
udp,in_port=685,vlan_tci=0x0000,dl_src=fa:16:3e:4e:86:32,dl_dst=ff:ff:ff:ff:ff:ff,nw_src=89.169.15.31,nw_dst=89.169.15.255,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=138,tp_dst=138
2025-04-06T18:34:00.918Z|00089|ofproto_dpif_upcall(handler48)|WARN|
Dropped 2 log messages in last 37 seconds (most recently, 7 seconds ago)
due to excessive rate
2025-04-06T18:34:00.918Z|00090|ofproto_dpif_upcall(handler48)|WARN|Flow:
udp,in_port=46,vlan_tci=0x0000,dl_src=fa:16:3e:4e:86:32,dl_dst=ff:ff:ff:ff:ff:ff,nw_src=89.169.15.31,nw_dst=89.169.15.255,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=138,tp_dst=138
2025-04-06T18:34:00.920Z|00091|ofproto_dpif_xlate(handler48)|WARN|over
4096 resubmit actions on bridge br-int while processing
udp,in_port=685,vlan_tci=0x0000,dl_src=fa:16:3e:4e:86:32,dl_dst=ff:ff:ff:ff:ff:ff,nw_src=89.169.15.31,nw_dst=89.169.15.255,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=138,tp_dst=138
2025-04-06T18:34:24.407Z|00127|ofproto_dpif_xlate(handler7)|WARN|over
4096 resubmit actions on bridge br-int while processing
udp,in_port=473,vlan_tci=0x0000,dl_src=fa:16:3e:66:0c:6e,dl_dst=ff:ff:ff:ff:ff:ff,nw_src=83.217.210.206,nw_dst=255.255.255.255,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=5678,tp_dst=5678
2025-04-06T18:35:11.952Z|00252|ofproto_dpif_xlate(handler38)|WARN|
Dropped 1 log messages in last 18 seconds (most recently, 18 seconds
ago) due to excessive rate
2025-04-06T18:35:11.952Z|00253|ofproto_dpif_xlate(handler38)|WARN|over
4096 resubmit actions on bridge br-int while processing
udp,in_port=541,vlan_tci=0x0000,dl_src=fa:16:3e:57:59:7d,dl_dst=01:00:5e:7f:ff:fa,nw_src=89.169.15.68,nw_dst=239.255.255.250,nw_tos=0,nw_ecn=0,nw_ttl=2,nw_frag=no,tp_src=60210,tp_dst=1900
2025-04-06T18:35:11.952Z|00254|ofproto_dpif_upcall(handler38)|WARN|
Dropped 3 log messages in last 71 seconds (most recently, 18 seconds
ago) due to excessive rate
2025-04-06T18:35:11.953Z|00255|ofproto_dpif_upcall(handler38)|WARN|Flow:
udp,in_port=346,vlan_tci=0x0000,dl_src=fa:16:3e:57:59:7d,dl_dst=01:00:5e:7f:ff:fa,nw_src=89.169.15.68,nw_dst=239.255.255.250,nw_tos=0,nw_ecn=0,nw_ttl=2,nw_frag=no,tp_src=60210,tp_dst=1900

Out of these logs, the UDP ones are for IP multicast/broadcast traffic,
those will still get flooded and potentially hit the resubmit limit -
none of the changes we tried tackle that.  I also don't think that
should cause much issues (aside from the vswitchd logs) though.

The only other packet I see logged there is:

2025-04-06T18:10:39.616Z|2159336|ofproto_dpif_xlate|WARN|over 4096
resubmit actions on bridge br-int while processing
arp,in_port=CONTROLLER,vlan_tci=0x0000,dl_src=fa:16:3e:63:3e:92,dl_dst=ff:ff:ff:ff:ff:ff,arp_spa=185.255.178.116,arp_tpa=185.255.178.116,arp_op=1,arp_sha=fa:16:3e:63:3e:92,arp_tha=00:00:00:00:00:00
2025-04-06T18:33:24.414Z|00125|ofproto_dpif_xlate(handler62)

This is a GARP (originated by ovn-controller) and in theory should not
be flooded with the change from
https://github.com/dceara/ovn/commits/refs/heads/tmp-branch-24.09-revert-mg-split/.

Just to double check, have you also upgraded ovn-northd to the patched
version?  That's required because the changes in the last commit on my
branch are for ovn-northd.

Thanks,
Dumitru

# ovn-controller --version (not using
ovn-controller 24.09.3
Open vSwitch Library 3.4.2
OpenFlow versions 0x6:0x6
SB DB Schema 20.37.0

# ovn-northd --version
ovn-northd 24.09.3
Open vSwitch Library 3.4.2

On 02.04.2025 10:52, Dumitru Ceara wrote:
On 4/1/25 7:30 PM, Ilia Baikov wrote:
Hi Dumitru,
Hi Ilia,

Sure, let's give it a try. Is it good idea to apply this patch on top of
the patch you previously sent to try?
Yes, you're right, we should apply it on top of that.  I pushed both
patches here:

https://github.com/dceara/ovn/commits/refs/heads/tmp-branch-24.09-
revert-mg-split/

Please ignore the CI failures, like mentioned yesterday the last commit
I did is known to break stuff, it's just to confirm that our problem in
this case is due to the MC_FLOOD_L2 output we do for ARP requests
targetting OVN router IPs.

What topology for testing is more preferrable to try it out? L2 or L3
(bgp-based networking)? In case of L3 there is actually small amount of
ARP pps compared to L2 where uplink interface (like eno1) is member of
br-ex bridge.

Would it be possible to try both?

Thank you for your involvement, really appreciate it!

No problem, thanks for taking time to help us try things out!

Regards,
Dumitru

01.04.2025 15:06, Dumitru Ceara wrote:
On 4/1/25 3:15 AM, Ilia Baikov wrote:
Hello,
Hi Ilia,

So the things go way deeper and it becomes way strange as i initially
thought.
I've migrated to L3 networking using ovn-bgp-agent, in order to reduce
ARP packets flooded over all ports attached to br-int. However this
didn't help at all and some VMs loses external connectivity (but VMs
ports didn't get flooded by hundreds pps of ARP).

2025-04-01T01:03:51.368Z|00035|ofproto_dpif_xlate(handler12)|WARN|
Dropped 131 log messages in last 55 seconds (most recently, 10 seconds
ago) due to excessive rate
2025-04-01T01:03:51.368Z|00036|ofproto_dpif_xlate(handler12)|WARN|over
4096 resubmit actions on bridge br-int while processing
udp,in_port=473,vlan_tci=0x0000,dl_src=fa:16:3e:66:0c:6e,dl_dst=ff:ff:ff:ff:ff:ff,nw_src=83.217.210.206,nw_dst=255.255.255.255,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=5678,tp_dst=5678
2025-04-01T01:04:35.392Z|00054|ofproto_dpif_upcall(handler60)|WARN|
Dropped 133 log messages in last 57 seconds (most recently, 13 seconds
ago) due to excessive rate
2025-04-01T01:04:35.392Z|00055|ofproto_dpif_upcall(handler60)|WARN|
Flow:
udp,in_port=277,vlan_tci=0x0000,dl_src=fa:16:3e:24:f1:f7,dl_dst=ff:ff:ff:ff:ff:ff,nw_src=83.217.210.148,nw_dst=83.217.210.255,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=137,tp_dst=137

bridge("br-int")
----------------
    0. in_port=277, priority 100, cookie 0xa70f8aad
       set_field:0x5d/0xffff->reg13
       set_field:0x2->reg11
       set_field:0x3->reg12
       set_field:0x2->metadata
       set_field:0x2ac->reg14
       set_field:0/0xffff0000->reg13
       resubmit(,8)
    8. metadata=0x2, priority 50, cookie 0xecab0c71
       set_field:0/0x1000->reg10
       resubmit(,73)
       73. reg14=0x2ac,metadata=0x2, priority 80, cookie 0xa70f8aad
               set_field:0x1000/0x1000->reg10
       move:NXM_NX_REG10[12]->NXM_NX_XXREG0[111]
        -> NXM_NX_XXREG0[111] is now 0x1
       resubmit(,9)
    9. reg0=0x8000/0x8000,metadata=0x2, priority 50, cookie 0x1a158a65
       drop

Final flow:
udp,reg0=0x8000,reg10=0x1000,reg11=0x2,reg12=0x3,reg13=0x5d,reg14=0x2ac,metadata=0x2,in_port=277,vlan_tci=0x0000,dl_src=fa:16:3e:24:f1:f7,dl_dst=ff:ff:ff:ff:ff:ff,nw_src=83.217.210.148,nw_dst=83.217.210.255,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=137,tp_dst=137
Megaflow:
recirc_id=0,eth,ip,in_port=277,dl_src=fa:16:3e:24:f1:f7,nw_frag=no
Datapath actions: drop
2025-04-01T01:04:48.434Z|00015|ofproto_dpif_xlate(handler58)|WARN|
Dropped 7 log messages in last 55 seconds (most recently, 13 seconds
ago) due to excessive rate
2025-04-01T01:04:48.435Z|00016|ofproto_dpif_xlate(handler58)|WARN|over
4096 resubmit actions on bridge br-int while processing
arp,in_port=487,vlan_tci=0x0000,dl_src=fa:16:3e:de:2a:ce,dl_dst=ff:ff:ff:ff:ff:ff,arp_spa=185.255.178.119,arp_tpa=185.255.178.1,arp_op=1,arp_sha=fa:16:3e:de:2a:ce,arp_tha=00:00:00:00:00:00
2025-04-01T01:05:51.377Z|00017|ofproto_dpif_xlate(handler46)|WARN|
Dropped 7 log messages in last 61 seconds (most recently, 29 seconds
ago) due to excessive rate
2025-04-01T01:05:51.377Z|00018|ofproto_dpif_xlate(handler46)|WARN|over
4096 resubmit actions on bridge br-int while processing
udp,in_port=473,vlan_tci=0x0000,dl_src=fa:16:3e:66:0c:6e,dl_dst=ff:ff:ff:ff:ff:ff,nw_src=83.217.210.206,nw_dst=255.255.255.255,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=5678,tp_dst=5678
2025-04-01T01:05:51.377Z|00019|ofproto_dpif_upcall(handler46)|WARN|
Dropped 8 log messages in last 63 seconds (most recently, 29 seconds
ago) due to excessive rate
2025-04-01T01:05:51.377Z|00020|ofproto_dpif_upcall(handler46)|WARN|
Flow:
udp,in_port=278,vlan_tci=0x0000,dl_src=fa:16:3e:66:0c:6e,dl_dst=ff:ff:ff:ff:ff:ff,nw_src=83.217.210.206,nw_dst=255.255.255.255,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=5678,tp_dst=5678

bridge("br-int")
----------------
    0. in_port=278, priority 100, cookie 0xd191f4de
       set_field:0x5e/0xffff->reg13
       set_field:0x2->reg11
       set_field:0x3->reg12
       set_field:0x2->metadata
       set_field:0x33d->reg14
       set_field:0/0xffff0000->reg13
       resubmit(,8)
    8. metadata=0x2, priority 50, cookie 0xecab0c71
       set_field:0/0x1000->reg10
       resubmit(,73)
       73. reg14=0x33d,metadata=0x2, priority 80, cookie 0xd191f4de
               set_field:0x1000/0x1000->reg10
       move:NXM_NX_REG10[12]->NXM_NX_XXREG0[111]
        -> NXM_NX_XXREG0[111] is now 0x1
       resubmit(,9)
    9. reg0=0x8000/0x8000,metadata=0x2, priority 50, cookie 0x1a158a65
       drop

Final flow:
udp,reg0=0x8000,reg10=0x1000,reg11=0x2,reg12=0x3,reg13=0x5e,reg14=0x33d,metadata=0x2,in_port=278,vlan_tci=0x0000,dl_src=fa:16:3e:66:0c:6e,dl_dst=ff:ff:ff:ff:ff:ff,nw_src=83.217.210.206,nw_dst=255.255.255.255,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=5678,tp_dst=5678
Megaflow:
recirc_id=0,eth,ip,in_port=278,dl_src=fa:16:3e:66:0c:6e,nw_frag=no
Datapath actions: drop

When i'm seeing this issue on mentioned IPs i can see that VM tries to
resolve default gateway mac address but with no success since ARP is
dropped somewhere because of resubmits and drop action on datapath. I
guess now this is not related to OVN controller. Meanwhile DHCP works
pretty good :)
Glad to hear about DHCP.

Option broadcast-arps-to-all-routers is disabled at both of provider
network Logical Switches.

Could there be any solution for this?
It's not a solution, more of a change to confirm whether this is
causing
issues, but could you please try with this commit?

https://github.com/dceara/ovn/commit/
fac23e2f6ef6effe3f9a2e0310e78d085750488b

The commit disables flooding of ARP requests that target OVN owned
router IPs to non-router ports.  It's not something that can be
accepted
as is because it breaks some other things, e.g., OVN generated GARP
requests will not be forwarded properly.

The patch applies on top of the ovn main branch.  If you want to try it
out on older branches the easiest way is to just ignore the test
changes.

Looking forward to hear how it went.

Thanks,
Dumitru

Regards,
Ilia Baikov

26.03.2025 12:34, Ilia Baikov пишет:
Got my hands on this, back to debugging. Seems like kernel runs
stable
# uname -r
6.14.0-061400-generic
Meanwhile there is no unrecognized(27) related logs.
tail -f /var/log/kolla/openvswitch/ovn-controller.log | grep -i
"dhcp"
2025-03-26T09:23:08.086Z|38050|pinctrl(ovn_pinctrl0)|INFO|DHCPACK
fa:16:3e:9c:f4:45 185.255.178.131
2025-03-26T09:23:08.086Z|38052|pinctrl(ovn_pinctrl0)|DBG|pinctrl
received  packet-in | opcode=PUT_DHCP_OPTS| OF_Table_ID=0|
OF_Cookie_ID=0xfb6fb11d| in-port=5| src-mac=fa:16:3e:9c:f4:45, dst-
mac=ff:ff:ff:ff:ff:ff| src-ip=0.0.0.0, dst-ip=255.255.255.255
2025-03-26T09:23:11.084Z|38054|pinctrl(ovn_pinctrl0)|INFO|DHCPACK
fa:16:3e:9c:f4:45 185.255.178.131
2025-03-26T09:23:11.085Z|38056|pinctrl(ovn_pinctrl0)|DBG|pinctrl
received  packet-in | opcode=PUT_DHCP_OPTS| OF_Table_ID=0|
OF_Cookie_ID=0xfb6fb11d| in-port=5| src-mac=fa:16:3e:9c:f4:45, dst-
mac=ff:ff:ff:ff:ff:ff| src-ip=0.0.0.0, dst-ip=255.255.255.255
2025-03-26T09:23:26.606Z|38058|pinctrl(ovn_pinctrl0)|INFO|DHCPACK
fa:16:3e:9c:f4:45 185.255.178.131
2025-03-26T09:23:26.606Z|38060|pinctrl(ovn_pinctrl0)|DBG|pinctrl
received  packet-in | opcode=PUT_DHCP_OPTS| OF_Table_ID=0|
OF_Cookie_ID=0xfb6fb11d| in-port=5| src-mac=fa:16:3e:9c:f4:45, dst-
mac=ff:ff:ff:ff:ff:ff| src-ip=0.0.0.0, dst-ip=255.255.255.255
2025-03-26T09:23:27.704Z|38062|pinctrl(ovn_pinctrl0)|INFO|DHCPACK
fa:16:3e:9c:f4:45 185.255.178.131
2025-03-26T09:23:27.704Z|38064|pinctrl(ovn_pinctrl0)|DBG|pinctrl
received  packet-in | opcode=PUT_DHCP_OPTS| OF_Table_ID=0|
OF_Cookie_ID=0xfb6fb11d| in-port=5| src-mac=fa:16:3e:9c:f4:45, dst-
mac=ff:ff:ff:ff:ff:ff| src-ip=0.0.0.0, dst-ip=255.255.255.255
2025-03-26T09:23:28.383Z|38066|pinctrl(ovn_pinctrl0)|INFO|DHCPACK
fa:16:3e:9c:f4:45 185.255.178.131
2025-03-26T09:23:28.383Z|38068|pinctrl(ovn_pinctrl0)|DBG|pinctrl
received  packet-in | opcode=PUT_DHCP_OPTS| OF_Table_ID=0|
OF_Cookie_ID=0xfb6fb11d| in-port=5| src-mac=fa:16:3e:9c:f4:45, dst-
mac=ff:ff:ff:ff:ff:ff| src-ip=0.0.0.0, dst-ip=255.255.255.255
2025-03-26T09:23:53.984Z|38070|pinctrl(ovn_pinctrl0)|INFO|DHCPACK
fa:16:3e:50:22:c4 185.255.178.170
2025-03-26T09:23:53.984Z|38072|pinctrl(ovn_pinctrl0)|DBG|pinctrl
received  packet-in | opcode=PUT_DHCP_OPTS| OF_Table_ID=0|
OF_Cookie_ID=0xa020594| in-port=184| src-mac=fa:16:3e:50:22:c4, dst-
mac=30:b6:4f:5f:db:a0| src-ip=185.255.178.170, dst-ip=185.255.178.1
2025-03-26T09:24:51.866Z|38074|pinctrl(ovn_pinctrl0)|INFO|DHCPACK
fa:16:3e:18:ac:4a 89.169.15.224
2025-03-26T09:24:51.866Z|38076|pinctrl(ovn_pinctrl0)|DBG|pinctrl
received  packet-in | opcode=PUT_DHCP_OPTS| OF_Table_ID=0|
OF_Cookie_ID=0xa020594| in-port=156| src-mac=fa:16:3e:18:ac:4a, dst-
mac=30:b6:4f:5f:db:a0| src-ip=89.169.15.224, dst-ip=89.169.15.1

And yes, there is logs about resubmit actions which are expected as
you said.
->reg15
    continuation.actions=unroll_xlate(table=0, cookie=0),resubmit(,32)
    continuation.actions=unroll_xlate(table=0, cookie=0),resubmit(,32)
OFPT_FLOW_MOD (OF1.5) (xid=0x235760): ADD table:8
priority=110,icmp6,reg10=0x10000/0x10000,reg15=0x28a,metadata=0x2,dl_src=fa:16:3e:48:61:ad,icmp_type=2,icmp_code=0
 cookie:0xd1588295 
actions=push:NXM_NX_REG14[],push:NXM_NX_REG15[],pop:NXM_NX_REG14[],pop:NXM_NX_REG15[],resubmit(,9)
OFPT_FLOW_MOD (OF1.5) (xid=0x235761): ADD table:8
priority=110,icmp,reg10=0x10000/0x10000,reg15=0x28a,metadata=0x2,dl_src=fa:16:3e:48:61:ad,icmp_type=3,icmp_code=4
 cookie:0xd1588295 
actions=push:NXM_NX_REG14[],push:NXM_NX_REG15[],pop:NXM_NX_REG14[],pop:NXM_NX_REG15[],resubmit(,9)
OFPT_FLOW_MOD (OF1.5) (xid=0x23577b): ADD table:80
priority=100,reg14=0x28a,metadata=0x2 cookie:0xa17d67d7
actions=set_field:0x9->reg11,set_field:0xa->reg12,resubmit(,8)
OFPT_FLOW_MOD (OF1.5) (xid=0x23577c): ADD table:43
priority=100,reg15=0x28a,metadata=0x2 cookie:0xa17d67d7
actions=set_field:0x1->reg15,resubmit(,43

OFPT_PACKET_OUT (OF1.5) (xid=0x235622): in_port=CONTROLLER
actions=set_field:0x2->metadata,set_field:0x503-
reg14,resubmit(CONTROLLER,8) data_len=42
OFPT_PACKET_OUT (OF1.5) (xid=0x235622): in_port=CONTROLLER
actions=set_field:0x2->metadata,set_field:0x503-
reg14,resubmit(CONTROLLER,8) data_len=42
    continuation.actions=unroll_xlate(table=0, cookie=0),resubmit(,32)
    continuation.actions=unroll_xlate(table=0, cookie=0),resubmit(,32)
    continuation.actions=unroll_xlate(table=0, cookie=0),resubmit(,32)
    continuation.actions=unroll_xlate(table=0, cookie=0),resubmit(,32)

top - 09:29:33 up 16:58,  3 users,  load average: 37.70, 37.39, 36.95
Threads: 114 total,  13 running, 101 sleeping,   0 stopped,   0
zombie
%Cpu(s): 27.5 us, 11.2 sy,  0.0 ni, 60.3 id,  0.5 wa,  0.0 hi, 0.6
si,  0.0 st
MiB Mem : 773901.8 total, 323685.3 free, 423865.0 used,  26351.4
buff/
cache
MiB Swap:   8192.0 total,   8192.0 free,      0.0 used. 345358.7
avail
Mem

#top -H -p $(pidof ovs-vswitchd)
       PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM TIME+
COMMAND
      6212 root      20   0 8388104 662016   7940 R  22.3   0.1
32:13.87
revalidator102
      6200 root      20   0 8388104 662016   7940 R  21.6   0.1
73:01.76
revalidator90
      6202 root      20   0 8388104 662016   7940 R  20.6   0.1
32:13.77
revalidator92
      6217 root      20   0 8388104 662016   7940 R  20.3   0.1
26:07.50
revalidator107
      6214 root      20   0 8388104 662016   7940 R  18.9   0.1
39:12.60
revalidator104
      6219 root      20   0 8388104 662016   7940 R  18.6   0.1
25:57.96
revalidator109
      6211 root      20   0 8388104 662016   7940 R  17.6   0.1
39:20.20
revalidator101
      6207 root      20   0 8388104 662016   7940 R  13.6   0.1
17:53.31
revalidator97
      6204 root      20   0 8388104 662016   7940 R  13.0   0.1
32:34.67
revalidator94
      6209 root      20   0 8388104 662016   7940 R  12.6   0.1
35:55.88
revalidator100
      6213 root      20   0 8388104 662016   7940 R  12.6   0.1
18:04.43
revalidator103
      6220 root      20   0 8388104 662016   7940 R  12.3   0.1
5:52.71
revalidator110
      6218 root      20   0 8388104 662016   7940 R  12.0   0.1
9:09.66
revalidator108
      6215 root      20   0 8388104 662016   7940 S   8.6   0.1
22:22.27
revalidator105
      6221 root      20   0 8388104 662016   7940 S   8.6   0.1
4:44.93
revalidator111
      6208 root      20   0 8388104 662016   7940 S   8.0   0.1
22:55.17
revalidator98
      6203 root      20   0 8388104 662016   7940 S   6.0   0.1
38:52.72
revalidator93
      6206 root      20   0 8388104 662016   7940 S   6.0   0.1
22:11.55
revalidator96
      6216 root      20   0 8388104 662016   7940 S   4.3   0.1
33:14.52
revalidator106

Sometimes revalidator processes drops to 5-10%, sometimes to 30%. I
guess this behaviour is because of resubmit actions?. So far so good
controller feeling fine, but there could be some sort of freezes on
sending DHCPREPLY.

top -H -p $(pidof ovn-controller)
top - 09:30:55 up 16:59,  3 users,  load average: 35.73, 36.70, 36.73
Threads:   5 total,   0 running,   5 sleeping,   0 stopped,   0
zombie
%Cpu(s): 27.7 us, 10.6 sy,  0.0 ni, 60.8 id,  0.3 wa,  0.0 hi, 0.6
si,  0.0 st
MiB Mem : 773901.8 total, 323758.4 free, 423785.6 used,  26357.8
buff/
cache
MiB Swap:   8192.0 total,   8192.0 free,      0.0 used. 345438.0
avail
Mem

       PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM TIME+
COMMAND
      5045 root      20   0  396500  83836   4432 S   0.7   0.0
11:46.59
ovn-controller
      5412 root      20   0  396500  83836   4432 S   0.0   0.0
0:06.23
ovn_pinctrl0
      5413 root      20   0  396500  83836   4432 S   0.0   0.0
0:00.00
urcu1
      5414 root      20   0  396500  83836   4432 S   0.0   0.0
0:00.20
ovn_statctrl2
      6103 root      20   0  396500  83836   4432 S   0.0   0.0
0:04.10
stopwatch3


Trying to reproduce on a real-world environment. There is 300
instances running with about 300Mbps network traffic in total.
Is there more logs or debug i can provide?

--
Regards,

Ilia Baikov
ilia.baikov@ib.systems
_______________________________________________
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to