Hi Dumitru,
Out of these logs, the UDP ones are for IP multicast/broadcast traffic, those will still get flooded and potentially hit the resubmit limit - none of the changes we tried tackle that. I also don't think that should cause much issues (aside from the vswitchd logs) though.The only other packet I see logged there is: 2025-04-06T18:10:39.616Z|2159336|ofproto_dpif_xlate|WARN|over 4096 resubmit actions on bridge br-int while processingarp,in_port=CONTROLLER,vlan_tci=0x0000,dl_src=fa:16:3e:63:3e:92,dl_dst=ff:ff:ff:ff:ff:ff,arp_spa=185.255.178.116,arp_tpa=185.255.178.116,arp_op=1,arp_sha=fa:16:3e:63:3e:92,arp_tha=00:00:00:00:00:002025-04-06T18:33:24.414Z|00125|ofproto_dpif_xlate(handler62) This is a GARP (originated by ovn-controller) and in theory should not be flooded with the change fromhttps://github.com/dceara/ovn/commits/refs/heads/tmp-branch-24.09-revert-mg-split/.Just to double check, have you also upgraded ovn-northd to the patched version? That's required because the changes in the last commit on my branch are for ovn-northd.
Yes, i've uploaded compiled ovn-controller as well as ovn-northd using your branch to containers and then restarted both containers. I will get my hands on this today again. Should I clear ovs database after updating binaries in containers before restarting them for recomputing? While i seeing resubmit messages and datapath drop actions, ovn-controller replies with DHCPOFFER/DHCPACK.
Could logs like follow mean something important in this case? Final flow:udp,reg0=0x8000,reg10=0x1000,reg11=0x2,reg12=0x3,reg13=0x5d,reg14=0x2ac,metadata=0x2,in_port=277,vlan_tci=0x0000,dl_src=fa:16:3e:24:f1:f7,dl_dst=ff:ff:ff:ff:ff:ff,nw_src=83.217.210.148,nw_dst=83.217.210.255,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=137,tp_dst=137
Megaflow: recirc_id=0,eth,ip,in_port=277,dl_src=fa:16:3e:24:f1:f7,nw_frag=no Datapath actions: drop On 08.04.2025 13:24, Dumitru Ceara wrote:
On 4/6/25 8:37 PM, Ilia Baikov wrote:Hello,Hi Ilia,I've compiled OVN using provided branch with both patches included and migrated some VMs to host with L3 networking to see what going to happen. Resubmit logs are now back. Interesting thing that there was about 270 VMs before so logs appears back on about 300 instances (as well as ports). There was no resubmit logs from 1 of April. So count of ports related to this case. 2025-04-06T18:09:36.797Z|00128|ofproto_dpif_xlate(handler20)|WARN|over 4096 resubmit actions on bridge br-int while processing udp,in_port=683,vlan_tci=0x0000,dl_src=fa:16:3e:27:0d:b2,dl_dst=01:00:5e:7f:ff:fa,nw_src=89.169.15.21,nw_dst=239.255.255.250,nw_tos=184,nw_ecn=0,nw_ttl=1,nw_frag=no,tp_src=56512,tp_dst=1900 2025-04-06T18:09:46.800Z|00123|ofproto_dpif_xlate(handler56)|WARN|over 4096 resubmit actions on bridge br-int while processing udp,in_port=683,vlan_tci=0x0000,dl_src=fa:16:3e:27:0d:b2,dl_dst=01:00:5e:7f:ff:fa,nw_src=89.169.15.21,nw_dst=239.255.255.250,nw_tos=184,nw_ecn=0,nw_ttl=1,nw_frag=no,tp_src=56512,tp_dst=1900 2025-04-06T18:09:46.808Z|00139|ofproto_dpif_xlate(handler64)|WARN|over 4096 resubmit actions on bridge br-int while processing udp,in_port=541,vlan_tci=0x0000,dl_src=fa:16:3e:57:59:7d,dl_dst=01:00:5e:7f:ff:fa,nw_src=89.169.15.68,nw_dst=239.255.255.250,nw_tos=0,nw_ecn=0,nw_ttl=2,nw_frag=no,tp_src=55936,tp_dst=1900 2025-04-06T18:09:51.225Z|00020|ofproto_dpif_xlate(handler49)|WARN|over 4096 resubmit actions on bridge br-int while processing udp,in_port=685,vlan_tci=0x0000,dl_src=fa:16:3e:4e:86:32,dl_dst=ff:ff:ff:ff:ff:ff,nw_src=89.169.15.31,nw_dst=89.169.15.255,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=138,tp_dst=138 2025-04-06T18:09:51.229Z|00021|ofproto_dpif_xlate(handler49)|WARN|over 4096 resubmit actions on bridge br-int while processing udp,in_port=685,vlan_tci=0x0000,dl_src=fa:16:3e:4e:86:32,dl_dst=ff:ff:ff:ff:ff:ff,nw_src=89.169.15.31,nw_dst=89.169.15.255,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=138,tp_dst=138 2025-04-06T18:10:39.616Z|2159336|ofproto_dpif_xlate|WARN|over 4096 resubmit actions on bridge br-int while processing arp,in_port=CONTROLLER,vlan_tci=0x0000,dl_src=fa:16:3e:63:3e:92,dl_dst=ff:ff:ff:ff:ff:ff,arp_spa=185.255.178.116,arp_tpa=185.255.178.116,arp_op=1,arp_sha=fa:16:3e:63:3e:92,arp_tha=00:00:00:00:00:00 2025-04-06T18:33:24.414Z|00125|ofproto_dpif_xlate(handler62)|WARN|over 4096 resubmit actions on bridge br-int while processing udp,in_port=473,vlan_tci=0x0000,dl_src=fa:16:3e:66:0c:6e,dl_dst=ff:ff:ff:ff:ff:ff,nw_src=83.217.210.206,nw_dst=255.255.255.255,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=5678,tp_dst=5678 2025-04-06T18:33:54.406Z|00102|ofproto_dpif_xlate(handler26)|WARN|over 4096 resubmit actions on bridge br-int while processing udp,in_port=473,vlan_tci=0x0000,dl_src=fa:16:3e:66:0c:6e,dl_dst=ff:ff:ff:ff:ff:ff,nw_src=83.217.210.206,nw_dst=255.255.255.255,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=5678,tp_dst=5678 2025-04-06T18:34:00.917Z|00088|ofproto_dpif_xlate(handler48)|WARN|over 4096 resubmit actions on bridge br-int while processing udp,in_port=685,vlan_tci=0x0000,dl_src=fa:16:3e:4e:86:32,dl_dst=ff:ff:ff:ff:ff:ff,nw_src=89.169.15.31,nw_dst=89.169.15.255,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=138,tp_dst=138 2025-04-06T18:34:00.918Z|00089|ofproto_dpif_upcall(handler48)|WARN| Dropped 2 log messages in last 37 seconds (most recently, 7 seconds ago) due to excessive rate 2025-04-06T18:34:00.918Z|00090|ofproto_dpif_upcall(handler48)|WARN|Flow: udp,in_port=46,vlan_tci=0x0000,dl_src=fa:16:3e:4e:86:32,dl_dst=ff:ff:ff:ff:ff:ff,nw_src=89.169.15.31,nw_dst=89.169.15.255,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=138,tp_dst=138 2025-04-06T18:34:00.920Z|00091|ofproto_dpif_xlate(handler48)|WARN|over 4096 resubmit actions on bridge br-int while processing udp,in_port=685,vlan_tci=0x0000,dl_src=fa:16:3e:4e:86:32,dl_dst=ff:ff:ff:ff:ff:ff,nw_src=89.169.15.31,nw_dst=89.169.15.255,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=138,tp_dst=138 2025-04-06T18:34:24.407Z|00127|ofproto_dpif_xlate(handler7)|WARN|over 4096 resubmit actions on bridge br-int while processing udp,in_port=473,vlan_tci=0x0000,dl_src=fa:16:3e:66:0c:6e,dl_dst=ff:ff:ff:ff:ff:ff,nw_src=83.217.210.206,nw_dst=255.255.255.255,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=5678,tp_dst=5678 2025-04-06T18:35:11.952Z|00252|ofproto_dpif_xlate(handler38)|WARN| Dropped 1 log messages in last 18 seconds (most recently, 18 seconds ago) due to excessive rate 2025-04-06T18:35:11.952Z|00253|ofproto_dpif_xlate(handler38)|WARN|over 4096 resubmit actions on bridge br-int while processing udp,in_port=541,vlan_tci=0x0000,dl_src=fa:16:3e:57:59:7d,dl_dst=01:00:5e:7f:ff:fa,nw_src=89.169.15.68,nw_dst=239.255.255.250,nw_tos=0,nw_ecn=0,nw_ttl=2,nw_frag=no,tp_src=60210,tp_dst=1900 2025-04-06T18:35:11.952Z|00254|ofproto_dpif_upcall(handler38)|WARN| Dropped 3 log messages in last 71 seconds (most recently, 18 seconds ago) due to excessive rate 2025-04-06T18:35:11.953Z|00255|ofproto_dpif_upcall(handler38)|WARN|Flow: udp,in_port=346,vlan_tci=0x0000,dl_src=fa:16:3e:57:59:7d,dl_dst=01:00:5e:7f:ff:fa,nw_src=89.169.15.68,nw_dst=239.255.255.250,nw_tos=0,nw_ecn=0,nw_ttl=2,nw_frag=no,tp_src=60210,tp_dst=1900Out of these logs, the UDP ones are for IP multicast/broadcast traffic, those will still get flooded and potentially hit the resubmit limit - none of the changes we tried tackle that. I also don't think that should cause much issues (aside from the vswitchd logs) though. The only other packet I see logged there is: 2025-04-06T18:10:39.616Z|2159336|ofproto_dpif_xlate|WARN|over 4096 resubmit actions on bridge br-int while processing arp,in_port=CONTROLLER,vlan_tci=0x0000,dl_src=fa:16:3e:63:3e:92,dl_dst=ff:ff:ff:ff:ff:ff,arp_spa=185.255.178.116,arp_tpa=185.255.178.116,arp_op=1,arp_sha=fa:16:3e:63:3e:92,arp_tha=00:00:00:00:00:00 2025-04-06T18:33:24.414Z|00125|ofproto_dpif_xlate(handler62) This is a GARP (originated by ovn-controller) and in theory should not be flooded with the change from https://github.com/dceara/ovn/commits/refs/heads/tmp-branch-24.09-revert-mg-split/. Just to double check, have you also upgraded ovn-northd to the patched version? That's required because the changes in the last commit on my branch are for ovn-northd. Thanks, Dumitru# ovn-controller --version (not using ovn-controller 24.09.3 Open vSwitch Library 3.4.2 OpenFlow versions 0x6:0x6 SB DB Schema 20.37.0 # ovn-northd --version ovn-northd 24.09.3 Open vSwitch Library 3.4.2 On 02.04.2025 10:52, Dumitru Ceara wrote:On 4/1/25 7:30 PM, Ilia Baikov wrote:Hi Dumitru,Hi Ilia,Sure, let's give it a try. Is it good idea to apply this patch on top of the patch you previously sent to try?Yes, you're right, we should apply it on top of that. I pushed both patches here: https://github.com/dceara/ovn/commits/refs/heads/tmp-branch-24.09- revert-mg-split/ Please ignore the CI failures, like mentioned yesterday the last commit I did is known to break stuff, it's just to confirm that our problem in this case is due to the MC_FLOOD_L2 output we do for ARP requests targetting OVN router IPs.What topology for testing is more preferrable to try it out? L2 or L3 (bgp-based networking)? In case of L3 there is actually small amount of ARP pps compared to L2 where uplink interface (like eno1) is member of br-ex bridge.Would it be possible to try both?Thank you for your involvement, really appreciate it!No problem, thanks for taking time to help us try things out! Regards, Dumitru01.04.2025 15:06, Dumitru Ceara wrote:On 4/1/25 3:15 AM, Ilia Baikov wrote:Hello,Hi Ilia,So the things go way deeper and it becomes way strange as i initially thought. I've migrated to L3 networking using ovn-bgp-agent, in order to reduce ARP packets flooded over all ports attached to br-int. However this didn't help at all and some VMs loses external connectivity (but VMs ports didn't get flooded by hundreds pps of ARP). 2025-04-01T01:03:51.368Z|00035|ofproto_dpif_xlate(handler12)|WARN| Dropped 131 log messages in last 55 seconds (most recently, 10 seconds ago) due to excessive rate 2025-04-01T01:03:51.368Z|00036|ofproto_dpif_xlate(handler12)|WARN|over 4096 resubmit actions on bridge br-int while processing udp,in_port=473,vlan_tci=0x0000,dl_src=fa:16:3e:66:0c:6e,dl_dst=ff:ff:ff:ff:ff:ff,nw_src=83.217.210.206,nw_dst=255.255.255.255,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=5678,tp_dst=5678 2025-04-01T01:04:35.392Z|00054|ofproto_dpif_upcall(handler60)|WARN| Dropped 133 log messages in last 57 seconds (most recently, 13 seconds ago) due to excessive rate 2025-04-01T01:04:35.392Z|00055|ofproto_dpif_upcall(handler60)|WARN| Flow: udp,in_port=277,vlan_tci=0x0000,dl_src=fa:16:3e:24:f1:f7,dl_dst=ff:ff:ff:ff:ff:ff,nw_src=83.217.210.148,nw_dst=83.217.210.255,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=137,tp_dst=137 bridge("br-int") ---------------- 0. in_port=277, priority 100, cookie 0xa70f8aad set_field:0x5d/0xffff->reg13 set_field:0x2->reg11 set_field:0x3->reg12 set_field:0x2->metadata set_field:0x2ac->reg14 set_field:0/0xffff0000->reg13 resubmit(,8) 8. metadata=0x2, priority 50, cookie 0xecab0c71 set_field:0/0x1000->reg10 resubmit(,73) 73. reg14=0x2ac,metadata=0x2, priority 80, cookie 0xa70f8aad set_field:0x1000/0x1000->reg10 move:NXM_NX_REG10[12]->NXM_NX_XXREG0[111] -> NXM_NX_XXREG0[111] is now 0x1 resubmit(,9) 9. reg0=0x8000/0x8000,metadata=0x2, priority 50, cookie 0x1a158a65 drop Final flow: udp,reg0=0x8000,reg10=0x1000,reg11=0x2,reg12=0x3,reg13=0x5d,reg14=0x2ac,metadata=0x2,in_port=277,vlan_tci=0x0000,dl_src=fa:16:3e:24:f1:f7,dl_dst=ff:ff:ff:ff:ff:ff,nw_src=83.217.210.148,nw_dst=83.217.210.255,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=137,tp_dst=137 Megaflow: recirc_id=0,eth,ip,in_port=277,dl_src=fa:16:3e:24:f1:f7,nw_frag=no Datapath actions: drop 2025-04-01T01:04:48.434Z|00015|ofproto_dpif_xlate(handler58)|WARN| Dropped 7 log messages in last 55 seconds (most recently, 13 seconds ago) due to excessive rate 2025-04-01T01:04:48.435Z|00016|ofproto_dpif_xlate(handler58)|WARN|over 4096 resubmit actions on bridge br-int while processing arp,in_port=487,vlan_tci=0x0000,dl_src=fa:16:3e:de:2a:ce,dl_dst=ff:ff:ff:ff:ff:ff,arp_spa=185.255.178.119,arp_tpa=185.255.178.1,arp_op=1,arp_sha=fa:16:3e:de:2a:ce,arp_tha=00:00:00:00:00:00 2025-04-01T01:05:51.377Z|00017|ofproto_dpif_xlate(handler46)|WARN| Dropped 7 log messages in last 61 seconds (most recently, 29 seconds ago) due to excessive rate 2025-04-01T01:05:51.377Z|00018|ofproto_dpif_xlate(handler46)|WARN|over 4096 resubmit actions on bridge br-int while processing udp,in_port=473,vlan_tci=0x0000,dl_src=fa:16:3e:66:0c:6e,dl_dst=ff:ff:ff:ff:ff:ff,nw_src=83.217.210.206,nw_dst=255.255.255.255,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=5678,tp_dst=5678 2025-04-01T01:05:51.377Z|00019|ofproto_dpif_upcall(handler46)|WARN| Dropped 8 log messages in last 63 seconds (most recently, 29 seconds ago) due to excessive rate 2025-04-01T01:05:51.377Z|00020|ofproto_dpif_upcall(handler46)|WARN| Flow: udp,in_port=278,vlan_tci=0x0000,dl_src=fa:16:3e:66:0c:6e,dl_dst=ff:ff:ff:ff:ff:ff,nw_src=83.217.210.206,nw_dst=255.255.255.255,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=5678,tp_dst=5678 bridge("br-int") ---------------- 0. in_port=278, priority 100, cookie 0xd191f4de set_field:0x5e/0xffff->reg13 set_field:0x2->reg11 set_field:0x3->reg12 set_field:0x2->metadata set_field:0x33d->reg14 set_field:0/0xffff0000->reg13 resubmit(,8) 8. metadata=0x2, priority 50, cookie 0xecab0c71 set_field:0/0x1000->reg10 resubmit(,73) 73. reg14=0x33d,metadata=0x2, priority 80, cookie 0xd191f4de set_field:0x1000/0x1000->reg10 move:NXM_NX_REG10[12]->NXM_NX_XXREG0[111] -> NXM_NX_XXREG0[111] is now 0x1 resubmit(,9) 9. reg0=0x8000/0x8000,metadata=0x2, priority 50, cookie 0x1a158a65 drop Final flow: udp,reg0=0x8000,reg10=0x1000,reg11=0x2,reg12=0x3,reg13=0x5e,reg14=0x33d,metadata=0x2,in_port=278,vlan_tci=0x0000,dl_src=fa:16:3e:66:0c:6e,dl_dst=ff:ff:ff:ff:ff:ff,nw_src=83.217.210.206,nw_dst=255.255.255.255,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=5678,tp_dst=5678 Megaflow: recirc_id=0,eth,ip,in_port=278,dl_src=fa:16:3e:66:0c:6e,nw_frag=no Datapath actions: drop When i'm seeing this issue on mentioned IPs i can see that VM tries to resolve default gateway mac address but with no success since ARP is dropped somewhere because of resubmits and drop action on datapath. I guess now this is not related to OVN controller. Meanwhile DHCP works pretty good :)Glad to hear about DHCP.Option broadcast-arps-to-all-routers is disabled at both of provider network Logical Switches. Could there be any solution for this?It's not a solution, more of a change to confirm whether this is causing issues, but could you please try with this commit? https://github.com/dceara/ovn/commit/ fac23e2f6ef6effe3f9a2e0310e78d085750488b The commit disables flooding of ARP requests that target OVN owned router IPs to non-router ports. It's not something that can be accepted as is because it breaks some other things, e.g., OVN generated GARP requests will not be forwarded properly. The patch applies on top of the ovn main branch. If you want to try it out on older branches the easiest way is to just ignore the test changes. Looking forward to hear how it went. Thanks, DumitruRegards, Ilia Baikov 26.03.2025 12:34, Ilia Baikov пишет:Got my hands on this, back to debugging. Seems like kernel runs stable # uname -r 6.14.0-061400-generic Meanwhile there is no unrecognized(27) related logs. tail -f /var/log/kolla/openvswitch/ovn-controller.log | grep -i "dhcp" 2025-03-26T09:23:08.086Z|38050|pinctrl(ovn_pinctrl0)|INFO|DHCPACK fa:16:3e:9c:f4:45 185.255.178.131 2025-03-26T09:23:08.086Z|38052|pinctrl(ovn_pinctrl0)|DBG|pinctrl received packet-in | opcode=PUT_DHCP_OPTS| OF_Table_ID=0| OF_Cookie_ID=0xfb6fb11d| in-port=5| src-mac=fa:16:3e:9c:f4:45, dst- mac=ff:ff:ff:ff:ff:ff| src-ip=0.0.0.0, dst-ip=255.255.255.255 2025-03-26T09:23:11.084Z|38054|pinctrl(ovn_pinctrl0)|INFO|DHCPACK fa:16:3e:9c:f4:45 185.255.178.131 2025-03-26T09:23:11.085Z|38056|pinctrl(ovn_pinctrl0)|DBG|pinctrl received packet-in | opcode=PUT_DHCP_OPTS| OF_Table_ID=0| OF_Cookie_ID=0xfb6fb11d| in-port=5| src-mac=fa:16:3e:9c:f4:45, dst- mac=ff:ff:ff:ff:ff:ff| src-ip=0.0.0.0, dst-ip=255.255.255.255 2025-03-26T09:23:26.606Z|38058|pinctrl(ovn_pinctrl0)|INFO|DHCPACK fa:16:3e:9c:f4:45 185.255.178.131 2025-03-26T09:23:26.606Z|38060|pinctrl(ovn_pinctrl0)|DBG|pinctrl received packet-in | opcode=PUT_DHCP_OPTS| OF_Table_ID=0| OF_Cookie_ID=0xfb6fb11d| in-port=5| src-mac=fa:16:3e:9c:f4:45, dst- mac=ff:ff:ff:ff:ff:ff| src-ip=0.0.0.0, dst-ip=255.255.255.255 2025-03-26T09:23:27.704Z|38062|pinctrl(ovn_pinctrl0)|INFO|DHCPACK fa:16:3e:9c:f4:45 185.255.178.131 2025-03-26T09:23:27.704Z|38064|pinctrl(ovn_pinctrl0)|DBG|pinctrl received packet-in | opcode=PUT_DHCP_OPTS| OF_Table_ID=0| OF_Cookie_ID=0xfb6fb11d| in-port=5| src-mac=fa:16:3e:9c:f4:45, dst- mac=ff:ff:ff:ff:ff:ff| src-ip=0.0.0.0, dst-ip=255.255.255.255 2025-03-26T09:23:28.383Z|38066|pinctrl(ovn_pinctrl0)|INFO|DHCPACK fa:16:3e:9c:f4:45 185.255.178.131 2025-03-26T09:23:28.383Z|38068|pinctrl(ovn_pinctrl0)|DBG|pinctrl received packet-in | opcode=PUT_DHCP_OPTS| OF_Table_ID=0| OF_Cookie_ID=0xfb6fb11d| in-port=5| src-mac=fa:16:3e:9c:f4:45, dst- mac=ff:ff:ff:ff:ff:ff| src-ip=0.0.0.0, dst-ip=255.255.255.255 2025-03-26T09:23:53.984Z|38070|pinctrl(ovn_pinctrl0)|INFO|DHCPACK fa:16:3e:50:22:c4 185.255.178.170 2025-03-26T09:23:53.984Z|38072|pinctrl(ovn_pinctrl0)|DBG|pinctrl received packet-in | opcode=PUT_DHCP_OPTS| OF_Table_ID=0| OF_Cookie_ID=0xa020594| in-port=184| src-mac=fa:16:3e:50:22:c4, dst- mac=30:b6:4f:5f:db:a0| src-ip=185.255.178.170, dst-ip=185.255.178.1 2025-03-26T09:24:51.866Z|38074|pinctrl(ovn_pinctrl0)|INFO|DHCPACK fa:16:3e:18:ac:4a 89.169.15.224 2025-03-26T09:24:51.866Z|38076|pinctrl(ovn_pinctrl0)|DBG|pinctrl received packet-in | opcode=PUT_DHCP_OPTS| OF_Table_ID=0| OF_Cookie_ID=0xa020594| in-port=156| src-mac=fa:16:3e:18:ac:4a, dst- mac=30:b6:4f:5f:db:a0| src-ip=89.169.15.224, dst-ip=89.169.15.1 And yes, there is logs about resubmit actions which are expected as you said. ->reg15 continuation.actions=unroll_xlate(table=0, cookie=0),resubmit(,32) continuation.actions=unroll_xlate(table=0, cookie=0),resubmit(,32) OFPT_FLOW_MOD (OF1.5) (xid=0x235760): ADD table:8 priority=110,icmp6,reg10=0x10000/0x10000,reg15=0x28a,metadata=0x2,dl_src=fa:16:3e:48:61:ad,icmp_type=2,icmp_code=0 cookie:0xd1588295 actions=push:NXM_NX_REG14[],push:NXM_NX_REG15[],pop:NXM_NX_REG14[],pop:NXM_NX_REG15[],resubmit(,9) OFPT_FLOW_MOD (OF1.5) (xid=0x235761): ADD table:8 priority=110,icmp,reg10=0x10000/0x10000,reg15=0x28a,metadata=0x2,dl_src=fa:16:3e:48:61:ad,icmp_type=3,icmp_code=4 cookie:0xd1588295 actions=push:NXM_NX_REG14[],push:NXM_NX_REG15[],pop:NXM_NX_REG14[],pop:NXM_NX_REG15[],resubmit(,9) OFPT_FLOW_MOD (OF1.5) (xid=0x23577b): ADD table:80 priority=100,reg14=0x28a,metadata=0x2 cookie:0xa17d67d7 actions=set_field:0x9->reg11,set_field:0xa->reg12,resubmit(,8) OFPT_FLOW_MOD (OF1.5) (xid=0x23577c): ADD table:43 priority=100,reg15=0x28a,metadata=0x2 cookie:0xa17d67d7 actions=set_field:0x1->reg15,resubmit(,43 OFPT_PACKET_OUT (OF1.5) (xid=0x235622): in_port=CONTROLLER actions=set_field:0x2->metadata,set_field:0x503-reg14,resubmit(CONTROLLER,8) data_len=42OFPT_PACKET_OUT (OF1.5) (xid=0x235622): in_port=CONTROLLER actions=set_field:0x2->metadata,set_field:0x503-reg14,resubmit(CONTROLLER,8) data_len=42continuation.actions=unroll_xlate(table=0, cookie=0),resubmit(,32) continuation.actions=unroll_xlate(table=0, cookie=0),resubmit(,32) continuation.actions=unroll_xlate(table=0, cookie=0),resubmit(,32) continuation.actions=unroll_xlate(table=0, cookie=0),resubmit(,32) top - 09:29:33 up 16:58, 3 users, load average: 37.70, 37.39, 36.95 Threads: 114 total, 13 running, 101 sleeping, 0 stopped, 0 zombie %Cpu(s): 27.5 us, 11.2 sy, 0.0 ni, 60.3 id, 0.5 wa, 0.0 hi, 0.6 si, 0.0 st MiB Mem : 773901.8 total, 323685.3 free, 423865.0 used, 26351.4 buff/ cache MiB Swap: 8192.0 total, 8192.0 free, 0.0 used. 345358.7 avail Mem #top -H -p $(pidof ovs-vswitchd) PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 6212 root 20 0 8388104 662016 7940 R 22.3 0.1 32:13.87 revalidator102 6200 root 20 0 8388104 662016 7940 R 21.6 0.1 73:01.76 revalidator90 6202 root 20 0 8388104 662016 7940 R 20.6 0.1 32:13.77 revalidator92 6217 root 20 0 8388104 662016 7940 R 20.3 0.1 26:07.50 revalidator107 6214 root 20 0 8388104 662016 7940 R 18.9 0.1 39:12.60 revalidator104 6219 root 20 0 8388104 662016 7940 R 18.6 0.1 25:57.96 revalidator109 6211 root 20 0 8388104 662016 7940 R 17.6 0.1 39:20.20 revalidator101 6207 root 20 0 8388104 662016 7940 R 13.6 0.1 17:53.31 revalidator97 6204 root 20 0 8388104 662016 7940 R 13.0 0.1 32:34.67 revalidator94 6209 root 20 0 8388104 662016 7940 R 12.6 0.1 35:55.88 revalidator100 6213 root 20 0 8388104 662016 7940 R 12.6 0.1 18:04.43 revalidator103 6220 root 20 0 8388104 662016 7940 R 12.3 0.1 5:52.71 revalidator110 6218 root 20 0 8388104 662016 7940 R 12.0 0.1 9:09.66 revalidator108 6215 root 20 0 8388104 662016 7940 S 8.6 0.1 22:22.27 revalidator105 6221 root 20 0 8388104 662016 7940 S 8.6 0.1 4:44.93 revalidator111 6208 root 20 0 8388104 662016 7940 S 8.0 0.1 22:55.17 revalidator98 6203 root 20 0 8388104 662016 7940 S 6.0 0.1 38:52.72 revalidator93 6206 root 20 0 8388104 662016 7940 S 6.0 0.1 22:11.55 revalidator96 6216 root 20 0 8388104 662016 7940 S 4.3 0.1 33:14.52 revalidator106 Sometimes revalidator processes drops to 5-10%, sometimes to 30%. I guess this behaviour is because of resubmit actions?. So far so good controller feeling fine, but there could be some sort of freezes on sending DHCPREPLY. top -H -p $(pidof ovn-controller) top - 09:30:55 up 16:59, 3 users, load average: 35.73, 36.70, 36.73 Threads: 5 total, 0 running, 5 sleeping, 0 stopped, 0 zombie %Cpu(s): 27.7 us, 10.6 sy, 0.0 ni, 60.8 id, 0.3 wa, 0.0 hi, 0.6 si, 0.0 st MiB Mem : 773901.8 total, 323758.4 free, 423785.6 used, 26357.8 buff/ cache MiB Swap: 8192.0 total, 8192.0 free, 0.0 used. 345438.0 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 5045 root 20 0 396500 83836 4432 S 0.7 0.0 11:46.59 ovn-controller 5412 root 20 0 396500 83836 4432 S 0.0 0.0 0:06.23 ovn_pinctrl0 5413 root 20 0 396500 83836 4432 S 0.0 0.0 0:00.00 urcu1 5414 root 20 0 396500 83836 4432 S 0.0 0.0 0:00.20 ovn_statctrl2 6103 root 20 0 396500 83836 4432 S 0.0 0.0 0:04.10 stopwatch3 Trying to reproduce on a real-world environment. There is 300 instances running with about 300Mbps network traffic in total. Is there more logs or debug i can provide?
-- Regards, Ilia Baikov ilia.baikov@ib.systems
_______________________________________________ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss