On 4/1/25 3:15 AM, Ilia Baikov wrote: > Hello, Hi Ilia,
> So the things go way deeper and it becomes way strange as i initially > thought. > I've migrated to L3 networking using ovn-bgp-agent, in order to reduce > ARP packets flooded over all ports attached to br-int. However this > didn't help at all and some VMs loses external connectivity (but VMs > ports didn't get flooded by hundreds pps of ARP). > > 2025-04-01T01:03:51.368Z|00035|ofproto_dpif_xlate(handler12)|WARN| > Dropped 131 log messages in last 55 seconds (most recently, 10 seconds > ago) due to excessive rate > 2025-04-01T01:03:51.368Z|00036|ofproto_dpif_xlate(handler12)|WARN|over > 4096 resubmit actions on bridge br-int while processing > udp,in_port=473,vlan_tci=0x0000,dl_src=fa:16:3e:66:0c:6e,dl_dst=ff:ff:ff:ff:ff:ff,nw_src=83.217.210.206,nw_dst=255.255.255.255,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=5678,tp_dst=5678 > 2025-04-01T01:04:35.392Z|00054|ofproto_dpif_upcall(handler60)|WARN| > Dropped 133 log messages in last 57 seconds (most recently, 13 seconds > ago) due to excessive rate > 2025-04-01T01:04:35.392Z|00055|ofproto_dpif_upcall(handler60)|WARN|Flow: > udp,in_port=277,vlan_tci=0x0000,dl_src=fa:16:3e:24:f1:f7,dl_dst=ff:ff:ff:ff:ff:ff,nw_src=83.217.210.148,nw_dst=83.217.210.255,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=137,tp_dst=137 > > bridge("br-int") > ---------------- > 0. in_port=277, priority 100, cookie 0xa70f8aad > set_field:0x5d/0xffff->reg13 > set_field:0x2->reg11 > set_field:0x3->reg12 > set_field:0x2->metadata > set_field:0x2ac->reg14 > set_field:0/0xffff0000->reg13 > resubmit(,8) > 8. metadata=0x2, priority 50, cookie 0xecab0c71 > set_field:0/0x1000->reg10 > resubmit(,73) > 73. reg14=0x2ac,metadata=0x2, priority 80, cookie 0xa70f8aad > set_field:0x1000/0x1000->reg10 > move:NXM_NX_REG10[12]->NXM_NX_XXREG0[111] > -> NXM_NX_XXREG0[111] is now 0x1 > resubmit(,9) > 9. reg0=0x8000/0x8000,metadata=0x2, priority 50, cookie 0x1a158a65 > drop > > Final flow: > udp,reg0=0x8000,reg10=0x1000,reg11=0x2,reg12=0x3,reg13=0x5d,reg14=0x2ac,metadata=0x2,in_port=277,vlan_tci=0x0000,dl_src=fa:16:3e:24:f1:f7,dl_dst=ff:ff:ff:ff:ff:ff,nw_src=83.217.210.148,nw_dst=83.217.210.255,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=137,tp_dst=137 > Megaflow: > recirc_id=0,eth,ip,in_port=277,dl_src=fa:16:3e:24:f1:f7,nw_frag=no > Datapath actions: drop > 2025-04-01T01:04:48.434Z|00015|ofproto_dpif_xlate(handler58)|WARN| > Dropped 7 log messages in last 55 seconds (most recently, 13 seconds > ago) due to excessive rate > 2025-04-01T01:04:48.435Z|00016|ofproto_dpif_xlate(handler58)|WARN|over > 4096 resubmit actions on bridge br-int while processing > arp,in_port=487,vlan_tci=0x0000,dl_src=fa:16:3e:de:2a:ce,dl_dst=ff:ff:ff:ff:ff:ff,arp_spa=185.255.178.119,arp_tpa=185.255.178.1,arp_op=1,arp_sha=fa:16:3e:de:2a:ce,arp_tha=00:00:00:00:00:00 > 2025-04-01T01:05:51.377Z|00017|ofproto_dpif_xlate(handler46)|WARN| > Dropped 7 log messages in last 61 seconds (most recently, 29 seconds > ago) due to excessive rate > 2025-04-01T01:05:51.377Z|00018|ofproto_dpif_xlate(handler46)|WARN|over > 4096 resubmit actions on bridge br-int while processing > udp,in_port=473,vlan_tci=0x0000,dl_src=fa:16:3e:66:0c:6e,dl_dst=ff:ff:ff:ff:ff:ff,nw_src=83.217.210.206,nw_dst=255.255.255.255,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=5678,tp_dst=5678 > 2025-04-01T01:05:51.377Z|00019|ofproto_dpif_upcall(handler46)|WARN| > Dropped 8 log messages in last 63 seconds (most recently, 29 seconds > ago) due to excessive rate > 2025-04-01T01:05:51.377Z|00020|ofproto_dpif_upcall(handler46)|WARN|Flow: > udp,in_port=278,vlan_tci=0x0000,dl_src=fa:16:3e:66:0c:6e,dl_dst=ff:ff:ff:ff:ff:ff,nw_src=83.217.210.206,nw_dst=255.255.255.255,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=5678,tp_dst=5678 > > bridge("br-int") > ---------------- > 0. in_port=278, priority 100, cookie 0xd191f4de > set_field:0x5e/0xffff->reg13 > set_field:0x2->reg11 > set_field:0x3->reg12 > set_field:0x2->metadata > set_field:0x33d->reg14 > set_field:0/0xffff0000->reg13 > resubmit(,8) > 8. metadata=0x2, priority 50, cookie 0xecab0c71 > set_field:0/0x1000->reg10 > resubmit(,73) > 73. reg14=0x33d,metadata=0x2, priority 80, cookie 0xd191f4de > set_field:0x1000/0x1000->reg10 > move:NXM_NX_REG10[12]->NXM_NX_XXREG0[111] > -> NXM_NX_XXREG0[111] is now 0x1 > resubmit(,9) > 9. reg0=0x8000/0x8000,metadata=0x2, priority 50, cookie 0x1a158a65 > drop > > Final flow: > udp,reg0=0x8000,reg10=0x1000,reg11=0x2,reg12=0x3,reg13=0x5e,reg14=0x33d,metadata=0x2,in_port=278,vlan_tci=0x0000,dl_src=fa:16:3e:66:0c:6e,dl_dst=ff:ff:ff:ff:ff:ff,nw_src=83.217.210.206,nw_dst=255.255.255.255,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=5678,tp_dst=5678 > Megaflow: > recirc_id=0,eth,ip,in_port=278,dl_src=fa:16:3e:66:0c:6e,nw_frag=no > Datapath actions: drop > > When i'm seeing this issue on mentioned IPs i can see that VM tries to > resolve default gateway mac address but with no success since ARP is > dropped somewhere because of resubmits and drop action on datapath. I > guess now this is not related to OVN controller. Meanwhile DHCP works > pretty good :) Glad to hear about DHCP. > Option broadcast-arps-to-all-routers is disabled at both of provider > network Logical Switches. > > Could there be any solution for this? It's not a solution, more of a change to confirm whether this is causing issues, but could you please try with this commit? https://github.com/dceara/ovn/commit/fac23e2f6ef6effe3f9a2e0310e78d085750488b The commit disables flooding of ARP requests that target OVN owned router IPs to non-router ports. It's not something that can be accepted as is because it breaks some other things, e.g., OVN generated GARP requests will not be forwarded properly. The patch applies on top of the ovn main branch. If you want to try it out on older branches the easiest way is to just ignore the test changes. Looking forward to hear how it went. Thanks, Dumitru > > Regards, > Ilia Baikov > > 26.03.2025 12:34, Ilia Baikov пишет: >> Got my hands on this, back to debugging. Seems like kernel runs stable >> # uname -r >> 6.14.0-061400-generic >> Meanwhile there is no unrecognized(27) related logs. >> tail -f /var/log/kolla/openvswitch/ovn-controller.log | grep -i "dhcp" >> 2025-03-26T09:23:08.086Z|38050|pinctrl(ovn_pinctrl0)|INFO|DHCPACK >> fa:16:3e:9c:f4:45 185.255.178.131 >> 2025-03-26T09:23:08.086Z|38052|pinctrl(ovn_pinctrl0)|DBG|pinctrl >> received packet-in | opcode=PUT_DHCP_OPTS| OF_Table_ID=0| >> OF_Cookie_ID=0xfb6fb11d| in-port=5| src-mac=fa:16:3e:9c:f4:45, dst- >> mac=ff:ff:ff:ff:ff:ff| src-ip=0.0.0.0, dst-ip=255.255.255.255 >> 2025-03-26T09:23:11.084Z|38054|pinctrl(ovn_pinctrl0)|INFO|DHCPACK >> fa:16:3e:9c:f4:45 185.255.178.131 >> 2025-03-26T09:23:11.085Z|38056|pinctrl(ovn_pinctrl0)|DBG|pinctrl >> received packet-in | opcode=PUT_DHCP_OPTS| OF_Table_ID=0| >> OF_Cookie_ID=0xfb6fb11d| in-port=5| src-mac=fa:16:3e:9c:f4:45, dst- >> mac=ff:ff:ff:ff:ff:ff| src-ip=0.0.0.0, dst-ip=255.255.255.255 >> 2025-03-26T09:23:26.606Z|38058|pinctrl(ovn_pinctrl0)|INFO|DHCPACK >> fa:16:3e:9c:f4:45 185.255.178.131 >> 2025-03-26T09:23:26.606Z|38060|pinctrl(ovn_pinctrl0)|DBG|pinctrl >> received packet-in | opcode=PUT_DHCP_OPTS| OF_Table_ID=0| >> OF_Cookie_ID=0xfb6fb11d| in-port=5| src-mac=fa:16:3e:9c:f4:45, dst- >> mac=ff:ff:ff:ff:ff:ff| src-ip=0.0.0.0, dst-ip=255.255.255.255 >> 2025-03-26T09:23:27.704Z|38062|pinctrl(ovn_pinctrl0)|INFO|DHCPACK >> fa:16:3e:9c:f4:45 185.255.178.131 >> 2025-03-26T09:23:27.704Z|38064|pinctrl(ovn_pinctrl0)|DBG|pinctrl >> received packet-in | opcode=PUT_DHCP_OPTS| OF_Table_ID=0| >> OF_Cookie_ID=0xfb6fb11d| in-port=5| src-mac=fa:16:3e:9c:f4:45, dst- >> mac=ff:ff:ff:ff:ff:ff| src-ip=0.0.0.0, dst-ip=255.255.255.255 >> 2025-03-26T09:23:28.383Z|38066|pinctrl(ovn_pinctrl0)|INFO|DHCPACK >> fa:16:3e:9c:f4:45 185.255.178.131 >> 2025-03-26T09:23:28.383Z|38068|pinctrl(ovn_pinctrl0)|DBG|pinctrl >> received packet-in | opcode=PUT_DHCP_OPTS| OF_Table_ID=0| >> OF_Cookie_ID=0xfb6fb11d| in-port=5| src-mac=fa:16:3e:9c:f4:45, dst- >> mac=ff:ff:ff:ff:ff:ff| src-ip=0.0.0.0, dst-ip=255.255.255.255 >> 2025-03-26T09:23:53.984Z|38070|pinctrl(ovn_pinctrl0)|INFO|DHCPACK >> fa:16:3e:50:22:c4 185.255.178.170 >> 2025-03-26T09:23:53.984Z|38072|pinctrl(ovn_pinctrl0)|DBG|pinctrl >> received packet-in | opcode=PUT_DHCP_OPTS| OF_Table_ID=0| >> OF_Cookie_ID=0xa020594| in-port=184| src-mac=fa:16:3e:50:22:c4, dst- >> mac=30:b6:4f:5f:db:a0| src-ip=185.255.178.170, dst-ip=185.255.178.1 >> 2025-03-26T09:24:51.866Z|38074|pinctrl(ovn_pinctrl0)|INFO|DHCPACK >> fa:16:3e:18:ac:4a 89.169.15.224 >> 2025-03-26T09:24:51.866Z|38076|pinctrl(ovn_pinctrl0)|DBG|pinctrl >> received packet-in | opcode=PUT_DHCP_OPTS| OF_Table_ID=0| >> OF_Cookie_ID=0xa020594| in-port=156| src-mac=fa:16:3e:18:ac:4a, dst- >> mac=30:b6:4f:5f:db:a0| src-ip=89.169.15.224, dst-ip=89.169.15.1 >> >> And yes, there is logs about resubmit actions which are expected as >> you said. >> ->reg15 >> continuation.actions=unroll_xlate(table=0, cookie=0),resubmit(,32) >> continuation.actions=unroll_xlate(table=0, cookie=0),resubmit(,32) >> OFPT_FLOW_MOD (OF1.5) (xid=0x235760): ADD table:8 >> priority=110,icmp6,reg10=0x10000/0x10000,reg15=0x28a,metadata=0x2,dl_src=fa:16:3e:48:61:ad,icmp_type=2,icmp_code=0 >> cookie:0xd1588295 >> actions=push:NXM_NX_REG14[],push:NXM_NX_REG15[],pop:NXM_NX_REG14[],pop:NXM_NX_REG15[],resubmit(,9) >> OFPT_FLOW_MOD (OF1.5) (xid=0x235761): ADD table:8 >> priority=110,icmp,reg10=0x10000/0x10000,reg15=0x28a,metadata=0x2,dl_src=fa:16:3e:48:61:ad,icmp_type=3,icmp_code=4 >> cookie:0xd1588295 >> actions=push:NXM_NX_REG14[],push:NXM_NX_REG15[],pop:NXM_NX_REG14[],pop:NXM_NX_REG15[],resubmit(,9) >> OFPT_FLOW_MOD (OF1.5) (xid=0x23577b): ADD table:80 >> priority=100,reg14=0x28a,metadata=0x2 cookie:0xa17d67d7 >> actions=set_field:0x9->reg11,set_field:0xa->reg12,resubmit(,8) >> OFPT_FLOW_MOD (OF1.5) (xid=0x23577c): ADD table:43 >> priority=100,reg15=0x28a,metadata=0x2 cookie:0xa17d67d7 >> actions=set_field:0x1->reg15,resubmit(,43 >> >> OFPT_PACKET_OUT (OF1.5) (xid=0x235622): in_port=CONTROLLER >> actions=set_field:0x2->metadata,set_field:0x503- >> >reg14,resubmit(CONTROLLER,8) data_len=42 >> OFPT_PACKET_OUT (OF1.5) (xid=0x235622): in_port=CONTROLLER >> actions=set_field:0x2->metadata,set_field:0x503- >> >reg14,resubmit(CONTROLLER,8) data_len=42 >> continuation.actions=unroll_xlate(table=0, cookie=0),resubmit(,32) >> continuation.actions=unroll_xlate(table=0, cookie=0),resubmit(,32) >> continuation.actions=unroll_xlate(table=0, cookie=0),resubmit(,32) >> continuation.actions=unroll_xlate(table=0, cookie=0),resubmit(,32) >> >> top - 09:29:33 up 16:58, 3 users, load average: 37.70, 37.39, 36.95 >> Threads: 114 total, 13 running, 101 sleeping, 0 stopped, 0 zombie >> %Cpu(s): 27.5 us, 11.2 sy, 0.0 ni, 60.3 id, 0.5 wa, 0.0 hi, 0.6 >> si, 0.0 st >> MiB Mem : 773901.8 total, 323685.3 free, 423865.0 used, 26351.4 buff/ >> cache >> MiB Swap: 8192.0 total, 8192.0 free, 0.0 used. 345358.7 avail >> Mem >> >> #top -H -p $(pidof ovs-vswitchd) >> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ >> COMMAND >> 6212 root 20 0 8388104 662016 7940 R 22.3 0.1 32:13.87 >> revalidator102 >> 6200 root 20 0 8388104 662016 7940 R 21.6 0.1 73:01.76 >> revalidator90 >> 6202 root 20 0 8388104 662016 7940 R 20.6 0.1 32:13.77 >> revalidator92 >> 6217 root 20 0 8388104 662016 7940 R 20.3 0.1 26:07.50 >> revalidator107 >> 6214 root 20 0 8388104 662016 7940 R 18.9 0.1 39:12.60 >> revalidator104 >> 6219 root 20 0 8388104 662016 7940 R 18.6 0.1 25:57.96 >> revalidator109 >> 6211 root 20 0 8388104 662016 7940 R 17.6 0.1 39:20.20 >> revalidator101 >> 6207 root 20 0 8388104 662016 7940 R 13.6 0.1 17:53.31 >> revalidator97 >> 6204 root 20 0 8388104 662016 7940 R 13.0 0.1 32:34.67 >> revalidator94 >> 6209 root 20 0 8388104 662016 7940 R 12.6 0.1 35:55.88 >> revalidator100 >> 6213 root 20 0 8388104 662016 7940 R 12.6 0.1 18:04.43 >> revalidator103 >> 6220 root 20 0 8388104 662016 7940 R 12.3 0.1 5:52.71 >> revalidator110 >> 6218 root 20 0 8388104 662016 7940 R 12.0 0.1 9:09.66 >> revalidator108 >> 6215 root 20 0 8388104 662016 7940 S 8.6 0.1 22:22.27 >> revalidator105 >> 6221 root 20 0 8388104 662016 7940 S 8.6 0.1 4:44.93 >> revalidator111 >> 6208 root 20 0 8388104 662016 7940 S 8.0 0.1 22:55.17 >> revalidator98 >> 6203 root 20 0 8388104 662016 7940 S 6.0 0.1 38:52.72 >> revalidator93 >> 6206 root 20 0 8388104 662016 7940 S 6.0 0.1 22:11.55 >> revalidator96 >> 6216 root 20 0 8388104 662016 7940 S 4.3 0.1 33:14.52 >> revalidator106 >> >> Sometimes revalidator processes drops to 5-10%, sometimes to 30%. I >> guess this behaviour is because of resubmit actions?. So far so good >> controller feeling fine, but there could be some sort of freezes on >> sending DHCPREPLY. >> >> top -H -p $(pidof ovn-controller) >> top - 09:30:55 up 16:59, 3 users, load average: 35.73, 36.70, 36.73 >> Threads: 5 total, 0 running, 5 sleeping, 0 stopped, 0 zombie >> %Cpu(s): 27.7 us, 10.6 sy, 0.0 ni, 60.8 id, 0.3 wa, 0.0 hi, 0.6 >> si, 0.0 st >> MiB Mem : 773901.8 total, 323758.4 free, 423785.6 used, 26357.8 buff/ >> cache >> MiB Swap: 8192.0 total, 8192.0 free, 0.0 used. 345438.0 avail >> Mem >> >> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ >> COMMAND >> 5045 root 20 0 396500 83836 4432 S 0.7 0.0 11:46.59 >> ovn-controller >> 5412 root 20 0 396500 83836 4432 S 0.0 0.0 0:06.23 >> ovn_pinctrl0 >> 5413 root 20 0 396500 83836 4432 S 0.0 0.0 0:00.00 >> urcu1 >> 5414 root 20 0 396500 83836 4432 S 0.0 0.0 0:00.20 >> ovn_statctrl2 >> 6103 root 20 0 396500 83836 4432 S 0.0 0.0 0:04.10 >> stopwatch3 >> >> >> Trying to reproduce on a real-world environment. There is 300 >> instances running with about 300Mbps network traffic in total. >> Is there more logs or debug i can provide? >> > _______________________________________________ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss