On 4/6/25 8:37 PM, Ilia Baikov wrote: > Hello, Hi Ilia,
> I've compiled OVN using provided branch with both patches included and > migrated some VMs to host with L3 networking to see what going to happen. > > Resubmit logs are now back. Interesting thing that there was about 270 > VMs before so logs appears back on about 300 instances (as well as > ports). There was no resubmit logs from 1 of April. So count of ports > related to this case. > > 2025-04-06T18:09:36.797Z|00128|ofproto_dpif_xlate(handler20)|WARN|over > 4096 resubmit actions on bridge br-int while processing > udp,in_port=683,vlan_tci=0x0000,dl_src=fa:16:3e:27:0d:b2,dl_dst=01:00:5e:7f:ff:fa,nw_src=89.169.15.21,nw_dst=239.255.255.250,nw_tos=184,nw_ecn=0,nw_ttl=1,nw_frag=no,tp_src=56512,tp_dst=1900 > 2025-04-06T18:09:46.800Z|00123|ofproto_dpif_xlate(handler56)|WARN|over > 4096 resubmit actions on bridge br-int while processing > udp,in_port=683,vlan_tci=0x0000,dl_src=fa:16:3e:27:0d:b2,dl_dst=01:00:5e:7f:ff:fa,nw_src=89.169.15.21,nw_dst=239.255.255.250,nw_tos=184,nw_ecn=0,nw_ttl=1,nw_frag=no,tp_src=56512,tp_dst=1900 > 2025-04-06T18:09:46.808Z|00139|ofproto_dpif_xlate(handler64)|WARN|over > 4096 resubmit actions on bridge br-int while processing > udp,in_port=541,vlan_tci=0x0000,dl_src=fa:16:3e:57:59:7d,dl_dst=01:00:5e:7f:ff:fa,nw_src=89.169.15.68,nw_dst=239.255.255.250,nw_tos=0,nw_ecn=0,nw_ttl=2,nw_frag=no,tp_src=55936,tp_dst=1900 > 2025-04-06T18:09:51.225Z|00020|ofproto_dpif_xlate(handler49)|WARN|over > 4096 resubmit actions on bridge br-int while processing > udp,in_port=685,vlan_tci=0x0000,dl_src=fa:16:3e:4e:86:32,dl_dst=ff:ff:ff:ff:ff:ff,nw_src=89.169.15.31,nw_dst=89.169.15.255,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=138,tp_dst=138 > 2025-04-06T18:09:51.229Z|00021|ofproto_dpif_xlate(handler49)|WARN|over > 4096 resubmit actions on bridge br-int while processing > udp,in_port=685,vlan_tci=0x0000,dl_src=fa:16:3e:4e:86:32,dl_dst=ff:ff:ff:ff:ff:ff,nw_src=89.169.15.31,nw_dst=89.169.15.255,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=138,tp_dst=138 > 2025-04-06T18:10:39.616Z|2159336|ofproto_dpif_xlate|WARN|over 4096 > resubmit actions on bridge br-int while processing > arp,in_port=CONTROLLER,vlan_tci=0x0000,dl_src=fa:16:3e:63:3e:92,dl_dst=ff:ff:ff:ff:ff:ff,arp_spa=185.255.178.116,arp_tpa=185.255.178.116,arp_op=1,arp_sha=fa:16:3e:63:3e:92,arp_tha=00:00:00:00:00:00 > 2025-04-06T18:33:24.414Z|00125|ofproto_dpif_xlate(handler62)|WARN|over > 4096 resubmit actions on bridge br-int while processing > udp,in_port=473,vlan_tci=0x0000,dl_src=fa:16:3e:66:0c:6e,dl_dst=ff:ff:ff:ff:ff:ff,nw_src=83.217.210.206,nw_dst=255.255.255.255,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=5678,tp_dst=5678 > 2025-04-06T18:33:54.406Z|00102|ofproto_dpif_xlate(handler26)|WARN|over > 4096 resubmit actions on bridge br-int while processing > udp,in_port=473,vlan_tci=0x0000,dl_src=fa:16:3e:66:0c:6e,dl_dst=ff:ff:ff:ff:ff:ff,nw_src=83.217.210.206,nw_dst=255.255.255.255,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=5678,tp_dst=5678 > 2025-04-06T18:34:00.917Z|00088|ofproto_dpif_xlate(handler48)|WARN|over > 4096 resubmit actions on bridge br-int while processing > udp,in_port=685,vlan_tci=0x0000,dl_src=fa:16:3e:4e:86:32,dl_dst=ff:ff:ff:ff:ff:ff,nw_src=89.169.15.31,nw_dst=89.169.15.255,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=138,tp_dst=138 > 2025-04-06T18:34:00.918Z|00089|ofproto_dpif_upcall(handler48)|WARN| > Dropped 2 log messages in last 37 seconds (most recently, 7 seconds ago) > due to excessive rate > 2025-04-06T18:34:00.918Z|00090|ofproto_dpif_upcall(handler48)|WARN|Flow: > udp,in_port=46,vlan_tci=0x0000,dl_src=fa:16:3e:4e:86:32,dl_dst=ff:ff:ff:ff:ff:ff,nw_src=89.169.15.31,nw_dst=89.169.15.255,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=138,tp_dst=138 > 2025-04-06T18:34:00.920Z|00091|ofproto_dpif_xlate(handler48)|WARN|over > 4096 resubmit actions on bridge br-int while processing > udp,in_port=685,vlan_tci=0x0000,dl_src=fa:16:3e:4e:86:32,dl_dst=ff:ff:ff:ff:ff:ff,nw_src=89.169.15.31,nw_dst=89.169.15.255,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=138,tp_dst=138 > 2025-04-06T18:34:24.407Z|00127|ofproto_dpif_xlate(handler7)|WARN|over > 4096 resubmit actions on bridge br-int while processing > udp,in_port=473,vlan_tci=0x0000,dl_src=fa:16:3e:66:0c:6e,dl_dst=ff:ff:ff:ff:ff:ff,nw_src=83.217.210.206,nw_dst=255.255.255.255,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=5678,tp_dst=5678 > 2025-04-06T18:35:11.952Z|00252|ofproto_dpif_xlate(handler38)|WARN| > Dropped 1 log messages in last 18 seconds (most recently, 18 seconds > ago) due to excessive rate > 2025-04-06T18:35:11.952Z|00253|ofproto_dpif_xlate(handler38)|WARN|over > 4096 resubmit actions on bridge br-int while processing > udp,in_port=541,vlan_tci=0x0000,dl_src=fa:16:3e:57:59:7d,dl_dst=01:00:5e:7f:ff:fa,nw_src=89.169.15.68,nw_dst=239.255.255.250,nw_tos=0,nw_ecn=0,nw_ttl=2,nw_frag=no,tp_src=60210,tp_dst=1900 > 2025-04-06T18:35:11.952Z|00254|ofproto_dpif_upcall(handler38)|WARN| > Dropped 3 log messages in last 71 seconds (most recently, 18 seconds > ago) due to excessive rate > 2025-04-06T18:35:11.953Z|00255|ofproto_dpif_upcall(handler38)|WARN|Flow: > udp,in_port=346,vlan_tci=0x0000,dl_src=fa:16:3e:57:59:7d,dl_dst=01:00:5e:7f:ff:fa,nw_src=89.169.15.68,nw_dst=239.255.255.250,nw_tos=0,nw_ecn=0,nw_ttl=2,nw_frag=no,tp_src=60210,tp_dst=1900 > Out of these logs, the UDP ones are for IP multicast/broadcast traffic, those will still get flooded and potentially hit the resubmit limit - none of the changes we tried tackle that. I also don't think that should cause much issues (aside from the vswitchd logs) though. The only other packet I see logged there is: 2025-04-06T18:10:39.616Z|2159336|ofproto_dpif_xlate|WARN|over 4096 resubmit actions on bridge br-int while processing arp,in_port=CONTROLLER,vlan_tci=0x0000,dl_src=fa:16:3e:63:3e:92,dl_dst=ff:ff:ff:ff:ff:ff,arp_spa=185.255.178.116,arp_tpa=185.255.178.116,arp_op=1,arp_sha=fa:16:3e:63:3e:92,arp_tha=00:00:00:00:00:00 2025-04-06T18:33:24.414Z|00125|ofproto_dpif_xlate(handler62) This is a GARP (originated by ovn-controller) and in theory should not be flooded with the change from https://github.com/dceara/ovn/commits/refs/heads/tmp-branch-24.09-revert-mg-split/. Just to double check, have you also upgraded ovn-northd to the patched version? That's required because the changes in the last commit on my branch are for ovn-northd. Thanks, Dumitru > # ovn-controller --version (not using > ovn-controller 24.09.3 > Open vSwitch Library 3.4.2 > OpenFlow versions 0x6:0x6 > SB DB Schema 20.37.0 > > # ovn-northd --version > ovn-northd 24.09.3 > Open vSwitch Library 3.4.2 > > On 02.04.2025 10:52, Dumitru Ceara wrote: >> On 4/1/25 7:30 PM, Ilia Baikov wrote: >>> Hi Dumitru, >> Hi Ilia, >> >>> Sure, let's give it a try. Is it good idea to apply this patch on top of >>> the patch you previously sent to try? >> Yes, you're right, we should apply it on top of that. I pushed both >> patches here: >> >> https://github.com/dceara/ovn/commits/refs/heads/tmp-branch-24.09- >> revert-mg-split/ >> >> Please ignore the CI failures, like mentioned yesterday the last commit >> I did is known to break stuff, it's just to confirm that our problem in >> this case is due to the MC_FLOOD_L2 output we do for ARP requests >> targetting OVN router IPs. >> >>> What topology for testing is more preferrable to try it out? L2 or L3 >>> (bgp-based networking)? In case of L3 there is actually small amount of >>> ARP pps compared to L2 where uplink interface (like eno1) is member of >>> br-ex bridge. >>> >> Would it be possible to try both? >> >>> Thank you for your involvement, really appreciate it! >>> >> No problem, thanks for taking time to help us try things out! >> >> Regards, >> Dumitru >> >>> 01.04.2025 15:06, Dumitru Ceara wrote: >>>> On 4/1/25 3:15 AM, Ilia Baikov wrote: >>>>> Hello, >>>> Hi Ilia, >>>> >>>>> So the things go way deeper and it becomes way strange as i initially >>>>> thought. >>>>> I've migrated to L3 networking using ovn-bgp-agent, in order to reduce >>>>> ARP packets flooded over all ports attached to br-int. However this >>>>> didn't help at all and some VMs loses external connectivity (but VMs >>>>> ports didn't get flooded by hundreds pps of ARP). >>>>> >>>>> 2025-04-01T01:03:51.368Z|00035|ofproto_dpif_xlate(handler12)|WARN| >>>>> Dropped 131 log messages in last 55 seconds (most recently, 10 seconds >>>>> ago) due to excessive rate >>>>> 2025-04-01T01:03:51.368Z|00036|ofproto_dpif_xlate(handler12)|WARN|over >>>>> 4096 resubmit actions on bridge br-int while processing >>>>> udp,in_port=473,vlan_tci=0x0000,dl_src=fa:16:3e:66:0c:6e,dl_dst=ff:ff:ff:ff:ff:ff,nw_src=83.217.210.206,nw_dst=255.255.255.255,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=5678,tp_dst=5678 >>>>> 2025-04-01T01:04:35.392Z|00054|ofproto_dpif_upcall(handler60)|WARN| >>>>> Dropped 133 log messages in last 57 seconds (most recently, 13 seconds >>>>> ago) due to excessive rate >>>>> 2025-04-01T01:04:35.392Z|00055|ofproto_dpif_upcall(handler60)|WARN| >>>>> Flow: >>>>> udp,in_port=277,vlan_tci=0x0000,dl_src=fa:16:3e:24:f1:f7,dl_dst=ff:ff:ff:ff:ff:ff,nw_src=83.217.210.148,nw_dst=83.217.210.255,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=137,tp_dst=137 >>>>> >>>>> bridge("br-int") >>>>> ---------------- >>>>> 0. in_port=277, priority 100, cookie 0xa70f8aad >>>>> set_field:0x5d/0xffff->reg13 >>>>> set_field:0x2->reg11 >>>>> set_field:0x3->reg12 >>>>> set_field:0x2->metadata >>>>> set_field:0x2ac->reg14 >>>>> set_field:0/0xffff0000->reg13 >>>>> resubmit(,8) >>>>> 8. metadata=0x2, priority 50, cookie 0xecab0c71 >>>>> set_field:0/0x1000->reg10 >>>>> resubmit(,73) >>>>> 73. reg14=0x2ac,metadata=0x2, priority 80, cookie 0xa70f8aad >>>>> set_field:0x1000/0x1000->reg10 >>>>> move:NXM_NX_REG10[12]->NXM_NX_XXREG0[111] >>>>> -> NXM_NX_XXREG0[111] is now 0x1 >>>>> resubmit(,9) >>>>> 9. reg0=0x8000/0x8000,metadata=0x2, priority 50, cookie 0x1a158a65 >>>>> drop >>>>> >>>>> Final flow: >>>>> udp,reg0=0x8000,reg10=0x1000,reg11=0x2,reg12=0x3,reg13=0x5d,reg14=0x2ac,metadata=0x2,in_port=277,vlan_tci=0x0000,dl_src=fa:16:3e:24:f1:f7,dl_dst=ff:ff:ff:ff:ff:ff,nw_src=83.217.210.148,nw_dst=83.217.210.255,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=137,tp_dst=137 >>>>> Megaflow: >>>>> recirc_id=0,eth,ip,in_port=277,dl_src=fa:16:3e:24:f1:f7,nw_frag=no >>>>> Datapath actions: drop >>>>> 2025-04-01T01:04:48.434Z|00015|ofproto_dpif_xlate(handler58)|WARN| >>>>> Dropped 7 log messages in last 55 seconds (most recently, 13 seconds >>>>> ago) due to excessive rate >>>>> 2025-04-01T01:04:48.435Z|00016|ofproto_dpif_xlate(handler58)|WARN|over >>>>> 4096 resubmit actions on bridge br-int while processing >>>>> arp,in_port=487,vlan_tci=0x0000,dl_src=fa:16:3e:de:2a:ce,dl_dst=ff:ff:ff:ff:ff:ff,arp_spa=185.255.178.119,arp_tpa=185.255.178.1,arp_op=1,arp_sha=fa:16:3e:de:2a:ce,arp_tha=00:00:00:00:00:00 >>>>> 2025-04-01T01:05:51.377Z|00017|ofproto_dpif_xlate(handler46)|WARN| >>>>> Dropped 7 log messages in last 61 seconds (most recently, 29 seconds >>>>> ago) due to excessive rate >>>>> 2025-04-01T01:05:51.377Z|00018|ofproto_dpif_xlate(handler46)|WARN|over >>>>> 4096 resubmit actions on bridge br-int while processing >>>>> udp,in_port=473,vlan_tci=0x0000,dl_src=fa:16:3e:66:0c:6e,dl_dst=ff:ff:ff:ff:ff:ff,nw_src=83.217.210.206,nw_dst=255.255.255.255,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=5678,tp_dst=5678 >>>>> 2025-04-01T01:05:51.377Z|00019|ofproto_dpif_upcall(handler46)|WARN| >>>>> Dropped 8 log messages in last 63 seconds (most recently, 29 seconds >>>>> ago) due to excessive rate >>>>> 2025-04-01T01:05:51.377Z|00020|ofproto_dpif_upcall(handler46)|WARN| >>>>> Flow: >>>>> udp,in_port=278,vlan_tci=0x0000,dl_src=fa:16:3e:66:0c:6e,dl_dst=ff:ff:ff:ff:ff:ff,nw_src=83.217.210.206,nw_dst=255.255.255.255,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=5678,tp_dst=5678 >>>>> >>>>> bridge("br-int") >>>>> ---------------- >>>>> 0. in_port=278, priority 100, cookie 0xd191f4de >>>>> set_field:0x5e/0xffff->reg13 >>>>> set_field:0x2->reg11 >>>>> set_field:0x3->reg12 >>>>> set_field:0x2->metadata >>>>> set_field:0x33d->reg14 >>>>> set_field:0/0xffff0000->reg13 >>>>> resubmit(,8) >>>>> 8. metadata=0x2, priority 50, cookie 0xecab0c71 >>>>> set_field:0/0x1000->reg10 >>>>> resubmit(,73) >>>>> 73. reg14=0x33d,metadata=0x2, priority 80, cookie 0xd191f4de >>>>> set_field:0x1000/0x1000->reg10 >>>>> move:NXM_NX_REG10[12]->NXM_NX_XXREG0[111] >>>>> -> NXM_NX_XXREG0[111] is now 0x1 >>>>> resubmit(,9) >>>>> 9. reg0=0x8000/0x8000,metadata=0x2, priority 50, cookie 0x1a158a65 >>>>> drop >>>>> >>>>> Final flow: >>>>> udp,reg0=0x8000,reg10=0x1000,reg11=0x2,reg12=0x3,reg13=0x5e,reg14=0x33d,metadata=0x2,in_port=278,vlan_tci=0x0000,dl_src=fa:16:3e:66:0c:6e,dl_dst=ff:ff:ff:ff:ff:ff,nw_src=83.217.210.206,nw_dst=255.255.255.255,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=5678,tp_dst=5678 >>>>> Megaflow: >>>>> recirc_id=0,eth,ip,in_port=278,dl_src=fa:16:3e:66:0c:6e,nw_frag=no >>>>> Datapath actions: drop >>>>> >>>>> When i'm seeing this issue on mentioned IPs i can see that VM tries to >>>>> resolve default gateway mac address but with no success since ARP is >>>>> dropped somewhere because of resubmits and drop action on datapath. I >>>>> guess now this is not related to OVN controller. Meanwhile DHCP works >>>>> pretty good :) >>>> Glad to hear about DHCP. >>>> >>>>> Option broadcast-arps-to-all-routers is disabled at both of provider >>>>> network Logical Switches. >>>>> >>>>> Could there be any solution for this? >>>> It's not a solution, more of a change to confirm whether this is >>>> causing >>>> issues, but could you please try with this commit? >>>> >>>> https://github.com/dceara/ovn/commit/ >>>> fac23e2f6ef6effe3f9a2e0310e78d085750488b >>>> >>>> The commit disables flooding of ARP requests that target OVN owned >>>> router IPs to non-router ports. It's not something that can be >>>> accepted >>>> as is because it breaks some other things, e.g., OVN generated GARP >>>> requests will not be forwarded properly. >>>> >>>> The patch applies on top of the ovn main branch. If you want to try it >>>> out on older branches the easiest way is to just ignore the test >>>> changes. >>>> >>>> Looking forward to hear how it went. >>>> >>>> Thanks, >>>> Dumitru >>>> >>>>> Regards, >>>>> Ilia Baikov >>>>> >>>>> 26.03.2025 12:34, Ilia Baikov пишет: >>>>>> Got my hands on this, back to debugging. Seems like kernel runs >>>>>> stable >>>>>> # uname -r >>>>>> 6.14.0-061400-generic >>>>>> Meanwhile there is no unrecognized(27) related logs. >>>>>> tail -f /var/log/kolla/openvswitch/ovn-controller.log | grep -i >>>>>> "dhcp" >>>>>> 2025-03-26T09:23:08.086Z|38050|pinctrl(ovn_pinctrl0)|INFO|DHCPACK >>>>>> fa:16:3e:9c:f4:45 185.255.178.131 >>>>>> 2025-03-26T09:23:08.086Z|38052|pinctrl(ovn_pinctrl0)|DBG|pinctrl >>>>>> received packet-in | opcode=PUT_DHCP_OPTS| OF_Table_ID=0| >>>>>> OF_Cookie_ID=0xfb6fb11d| in-port=5| src-mac=fa:16:3e:9c:f4:45, dst- >>>>>> mac=ff:ff:ff:ff:ff:ff| src-ip=0.0.0.0, dst-ip=255.255.255.255 >>>>>> 2025-03-26T09:23:11.084Z|38054|pinctrl(ovn_pinctrl0)|INFO|DHCPACK >>>>>> fa:16:3e:9c:f4:45 185.255.178.131 >>>>>> 2025-03-26T09:23:11.085Z|38056|pinctrl(ovn_pinctrl0)|DBG|pinctrl >>>>>> received packet-in | opcode=PUT_DHCP_OPTS| OF_Table_ID=0| >>>>>> OF_Cookie_ID=0xfb6fb11d| in-port=5| src-mac=fa:16:3e:9c:f4:45, dst- >>>>>> mac=ff:ff:ff:ff:ff:ff| src-ip=0.0.0.0, dst-ip=255.255.255.255 >>>>>> 2025-03-26T09:23:26.606Z|38058|pinctrl(ovn_pinctrl0)|INFO|DHCPACK >>>>>> fa:16:3e:9c:f4:45 185.255.178.131 >>>>>> 2025-03-26T09:23:26.606Z|38060|pinctrl(ovn_pinctrl0)|DBG|pinctrl >>>>>> received packet-in | opcode=PUT_DHCP_OPTS| OF_Table_ID=0| >>>>>> OF_Cookie_ID=0xfb6fb11d| in-port=5| src-mac=fa:16:3e:9c:f4:45, dst- >>>>>> mac=ff:ff:ff:ff:ff:ff| src-ip=0.0.0.0, dst-ip=255.255.255.255 >>>>>> 2025-03-26T09:23:27.704Z|38062|pinctrl(ovn_pinctrl0)|INFO|DHCPACK >>>>>> fa:16:3e:9c:f4:45 185.255.178.131 >>>>>> 2025-03-26T09:23:27.704Z|38064|pinctrl(ovn_pinctrl0)|DBG|pinctrl >>>>>> received packet-in | opcode=PUT_DHCP_OPTS| OF_Table_ID=0| >>>>>> OF_Cookie_ID=0xfb6fb11d| in-port=5| src-mac=fa:16:3e:9c:f4:45, dst- >>>>>> mac=ff:ff:ff:ff:ff:ff| src-ip=0.0.0.0, dst-ip=255.255.255.255 >>>>>> 2025-03-26T09:23:28.383Z|38066|pinctrl(ovn_pinctrl0)|INFO|DHCPACK >>>>>> fa:16:3e:9c:f4:45 185.255.178.131 >>>>>> 2025-03-26T09:23:28.383Z|38068|pinctrl(ovn_pinctrl0)|DBG|pinctrl >>>>>> received packet-in | opcode=PUT_DHCP_OPTS| OF_Table_ID=0| >>>>>> OF_Cookie_ID=0xfb6fb11d| in-port=5| src-mac=fa:16:3e:9c:f4:45, dst- >>>>>> mac=ff:ff:ff:ff:ff:ff| src-ip=0.0.0.0, dst-ip=255.255.255.255 >>>>>> 2025-03-26T09:23:53.984Z|38070|pinctrl(ovn_pinctrl0)|INFO|DHCPACK >>>>>> fa:16:3e:50:22:c4 185.255.178.170 >>>>>> 2025-03-26T09:23:53.984Z|38072|pinctrl(ovn_pinctrl0)|DBG|pinctrl >>>>>> received packet-in | opcode=PUT_DHCP_OPTS| OF_Table_ID=0| >>>>>> OF_Cookie_ID=0xa020594| in-port=184| src-mac=fa:16:3e:50:22:c4, dst- >>>>>> mac=30:b6:4f:5f:db:a0| src-ip=185.255.178.170, dst-ip=185.255.178.1 >>>>>> 2025-03-26T09:24:51.866Z|38074|pinctrl(ovn_pinctrl0)|INFO|DHCPACK >>>>>> fa:16:3e:18:ac:4a 89.169.15.224 >>>>>> 2025-03-26T09:24:51.866Z|38076|pinctrl(ovn_pinctrl0)|DBG|pinctrl >>>>>> received packet-in | opcode=PUT_DHCP_OPTS| OF_Table_ID=0| >>>>>> OF_Cookie_ID=0xa020594| in-port=156| src-mac=fa:16:3e:18:ac:4a, dst- >>>>>> mac=30:b6:4f:5f:db:a0| src-ip=89.169.15.224, dst-ip=89.169.15.1 >>>>>> >>>>>> And yes, there is logs about resubmit actions which are expected as >>>>>> you said. >>>>>> ->reg15 >>>>>> continuation.actions=unroll_xlate(table=0, cookie=0),resubmit(,32) >>>>>> continuation.actions=unroll_xlate(table=0, cookie=0),resubmit(,32) >>>>>> OFPT_FLOW_MOD (OF1.5) (xid=0x235760): ADD table:8 >>>>>> priority=110,icmp6,reg10=0x10000/0x10000,reg15=0x28a,metadata=0x2,dl_src=fa:16:3e:48:61:ad,icmp_type=2,icmp_code=0 >>>>>> cookie:0xd1588295 >>>>>> actions=push:NXM_NX_REG14[],push:NXM_NX_REG15[],pop:NXM_NX_REG14[],pop:NXM_NX_REG15[],resubmit(,9) >>>>>> OFPT_FLOW_MOD (OF1.5) (xid=0x235761): ADD table:8 >>>>>> priority=110,icmp,reg10=0x10000/0x10000,reg15=0x28a,metadata=0x2,dl_src=fa:16:3e:48:61:ad,icmp_type=3,icmp_code=4 >>>>>> cookie:0xd1588295 >>>>>> actions=push:NXM_NX_REG14[],push:NXM_NX_REG15[],pop:NXM_NX_REG14[],pop:NXM_NX_REG15[],resubmit(,9) >>>>>> OFPT_FLOW_MOD (OF1.5) (xid=0x23577b): ADD table:80 >>>>>> priority=100,reg14=0x28a,metadata=0x2 cookie:0xa17d67d7 >>>>>> actions=set_field:0x9->reg11,set_field:0xa->reg12,resubmit(,8) >>>>>> OFPT_FLOW_MOD (OF1.5) (xid=0x23577c): ADD table:43 >>>>>> priority=100,reg15=0x28a,metadata=0x2 cookie:0xa17d67d7 >>>>>> actions=set_field:0x1->reg15,resubmit(,43 >>>>>> >>>>>> OFPT_PACKET_OUT (OF1.5) (xid=0x235622): in_port=CONTROLLER >>>>>> actions=set_field:0x2->metadata,set_field:0x503- >>>>>>> reg14,resubmit(CONTROLLER,8) data_len=42 >>>>>> OFPT_PACKET_OUT (OF1.5) (xid=0x235622): in_port=CONTROLLER >>>>>> actions=set_field:0x2->metadata,set_field:0x503- >>>>>>> reg14,resubmit(CONTROLLER,8) data_len=42 >>>>>> continuation.actions=unroll_xlate(table=0, cookie=0),resubmit(,32) >>>>>> continuation.actions=unroll_xlate(table=0, cookie=0),resubmit(,32) >>>>>> continuation.actions=unroll_xlate(table=0, cookie=0),resubmit(,32) >>>>>> continuation.actions=unroll_xlate(table=0, cookie=0),resubmit(,32) >>>>>> >>>>>> top - 09:29:33 up 16:58, 3 users, load average: 37.70, 37.39, 36.95 >>>>>> Threads: 114 total, 13 running, 101 sleeping, 0 stopped, 0 >>>>>> zombie >>>>>> %Cpu(s): 27.5 us, 11.2 sy, 0.0 ni, 60.3 id, 0.5 wa, 0.0 hi, 0.6 >>>>>> si, 0.0 st >>>>>> MiB Mem : 773901.8 total, 323685.3 free, 423865.0 used, 26351.4 >>>>>> buff/ >>>>>> cache >>>>>> MiB Swap: 8192.0 total, 8192.0 free, 0.0 used. 345358.7 >>>>>> avail >>>>>> Mem >>>>>> >>>>>> #top -H -p $(pidof ovs-vswitchd) >>>>>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ >>>>>> COMMAND >>>>>> 6212 root 20 0 8388104 662016 7940 R 22.3 0.1 >>>>>> 32:13.87 >>>>>> revalidator102 >>>>>> 6200 root 20 0 8388104 662016 7940 R 21.6 0.1 >>>>>> 73:01.76 >>>>>> revalidator90 >>>>>> 6202 root 20 0 8388104 662016 7940 R 20.6 0.1 >>>>>> 32:13.77 >>>>>> revalidator92 >>>>>> 6217 root 20 0 8388104 662016 7940 R 20.3 0.1 >>>>>> 26:07.50 >>>>>> revalidator107 >>>>>> 6214 root 20 0 8388104 662016 7940 R 18.9 0.1 >>>>>> 39:12.60 >>>>>> revalidator104 >>>>>> 6219 root 20 0 8388104 662016 7940 R 18.6 0.1 >>>>>> 25:57.96 >>>>>> revalidator109 >>>>>> 6211 root 20 0 8388104 662016 7940 R 17.6 0.1 >>>>>> 39:20.20 >>>>>> revalidator101 >>>>>> 6207 root 20 0 8388104 662016 7940 R 13.6 0.1 >>>>>> 17:53.31 >>>>>> revalidator97 >>>>>> 6204 root 20 0 8388104 662016 7940 R 13.0 0.1 >>>>>> 32:34.67 >>>>>> revalidator94 >>>>>> 6209 root 20 0 8388104 662016 7940 R 12.6 0.1 >>>>>> 35:55.88 >>>>>> revalidator100 >>>>>> 6213 root 20 0 8388104 662016 7940 R 12.6 0.1 >>>>>> 18:04.43 >>>>>> revalidator103 >>>>>> 6220 root 20 0 8388104 662016 7940 R 12.3 0.1 >>>>>> 5:52.71 >>>>>> revalidator110 >>>>>> 6218 root 20 0 8388104 662016 7940 R 12.0 0.1 >>>>>> 9:09.66 >>>>>> revalidator108 >>>>>> 6215 root 20 0 8388104 662016 7940 S 8.6 0.1 >>>>>> 22:22.27 >>>>>> revalidator105 >>>>>> 6221 root 20 0 8388104 662016 7940 S 8.6 0.1 >>>>>> 4:44.93 >>>>>> revalidator111 >>>>>> 6208 root 20 0 8388104 662016 7940 S 8.0 0.1 >>>>>> 22:55.17 >>>>>> revalidator98 >>>>>> 6203 root 20 0 8388104 662016 7940 S 6.0 0.1 >>>>>> 38:52.72 >>>>>> revalidator93 >>>>>> 6206 root 20 0 8388104 662016 7940 S 6.0 0.1 >>>>>> 22:11.55 >>>>>> revalidator96 >>>>>> 6216 root 20 0 8388104 662016 7940 S 4.3 0.1 >>>>>> 33:14.52 >>>>>> revalidator106 >>>>>> >>>>>> Sometimes revalidator processes drops to 5-10%, sometimes to 30%. I >>>>>> guess this behaviour is because of resubmit actions?. So far so good >>>>>> controller feeling fine, but there could be some sort of freezes on >>>>>> sending DHCPREPLY. >>>>>> >>>>>> top -H -p $(pidof ovn-controller) >>>>>> top - 09:30:55 up 16:59, 3 users, load average: 35.73, 36.70, 36.73 >>>>>> Threads: 5 total, 0 running, 5 sleeping, 0 stopped, 0 >>>>>> zombie >>>>>> %Cpu(s): 27.7 us, 10.6 sy, 0.0 ni, 60.8 id, 0.3 wa, 0.0 hi, 0.6 >>>>>> si, 0.0 st >>>>>> MiB Mem : 773901.8 total, 323758.4 free, 423785.6 used, 26357.8 >>>>>> buff/ >>>>>> cache >>>>>> MiB Swap: 8192.0 total, 8192.0 free, 0.0 used. 345438.0 >>>>>> avail >>>>>> Mem >>>>>> >>>>>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ >>>>>> COMMAND >>>>>> 5045 root 20 0 396500 83836 4432 S 0.7 0.0 >>>>>> 11:46.59 >>>>>> ovn-controller >>>>>> 5412 root 20 0 396500 83836 4432 S 0.0 0.0 >>>>>> 0:06.23 >>>>>> ovn_pinctrl0 >>>>>> 5413 root 20 0 396500 83836 4432 S 0.0 0.0 >>>>>> 0:00.00 >>>>>> urcu1 >>>>>> 5414 root 20 0 396500 83836 4432 S 0.0 0.0 >>>>>> 0:00.20 >>>>>> ovn_statctrl2 >>>>>> 6103 root 20 0 396500 83836 4432 S 0.0 0.0 >>>>>> 0:04.10 >>>>>> stopwatch3 >>>>>> >>>>>> >>>>>> Trying to reproduce on a real-world environment. There is 300 >>>>>> instances running with about 300Mbps network traffic in total. >>>>>> Is there more logs or debug i can provide? >>>>>> _______________________________________________ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss