Hello,
So the things go way deeper and it becomes way strange as i initially
thought.
I've migrated to L3 networking using ovn-bgp-agent, in order to reduce
ARP packets flooded over all ports attached to br-int. However this
didn't help at all and some VMs loses external connectivity (but VMs
ports didn't get flooded by hundreds pps of ARP).
2025-04-01T01:03:51.368Z|00035|ofproto_dpif_xlate(handler12)|WARN|Dropped
131 log messages in last 55 seconds (most recently, 10 seconds ago) due
to excessive rate
2025-04-01T01:03:51.368Z|00036|ofproto_dpif_xlate(handler12)|WARN|over
4096 resubmit actions on bridge br-int while processing
udp,in_port=473,vlan_tci=0x0000,dl_src=fa:16:3e:66:0c:6e,dl_dst=ff:ff:ff:ff:ff:ff,nw_src=83.217.210.206,nw_dst=255.255.255.255,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=5678,tp_dst=5678
2025-04-01T01:04:35.392Z|00054|ofproto_dpif_upcall(handler60)|WARN|Dropped
133 log messages in last 57 seconds (most recently, 13 seconds ago) due
to excessive rate
2025-04-01T01:04:35.392Z|00055|ofproto_dpif_upcall(handler60)|WARN|Flow:
udp,in_port=277,vlan_tci=0x0000,dl_src=fa:16:3e:24:f1:f7,dl_dst=ff:ff:ff:ff:ff:ff,nw_src=83.217.210.148,nw_dst=83.217.210.255,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=137,tp_dst=137
bridge("br-int")
----------------
0. in_port=277, priority 100, cookie 0xa70f8aad
set_field:0x5d/0xffff->reg13
set_field:0x2->reg11
set_field:0x3->reg12
set_field:0x2->metadata
set_field:0x2ac->reg14
set_field:0/0xffff0000->reg13
resubmit(,8)
8. metadata=0x2, priority 50, cookie 0xecab0c71
set_field:0/0x1000->reg10
resubmit(,73)
73. reg14=0x2ac,metadata=0x2, priority 80, cookie 0xa70f8aad
set_field:0x1000/0x1000->reg10
move:NXM_NX_REG10[12]->NXM_NX_XXREG0[111]
-> NXM_NX_XXREG0[111] is now 0x1
resubmit(,9)
9. reg0=0x8000/0x8000,metadata=0x2, priority 50, cookie 0x1a158a65
drop
Final flow:
udp,reg0=0x8000,reg10=0x1000,reg11=0x2,reg12=0x3,reg13=0x5d,reg14=0x2ac,metadata=0x2,in_port=277,vlan_tci=0x0000,dl_src=fa:16:3e:24:f1:f7,dl_dst=ff:ff:ff:ff:ff:ff,nw_src=83.217.210.148,nw_dst=83.217.210.255,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=137,tp_dst=137
Megaflow: recirc_id=0,eth,ip,in_port=277,dl_src=fa:16:3e:24:f1:f7,nw_frag=no
Datapath actions: drop
2025-04-01T01:04:48.434Z|00015|ofproto_dpif_xlate(handler58)|WARN|Dropped
7 log messages in last 55 seconds (most recently, 13 seconds ago) due to
excessive rate
2025-04-01T01:04:48.435Z|00016|ofproto_dpif_xlate(handler58)|WARN|over
4096 resubmit actions on bridge br-int while processing
arp,in_port=487,vlan_tci=0x0000,dl_src=fa:16:3e:de:2a:ce,dl_dst=ff:ff:ff:ff:ff:ff,arp_spa=185.255.178.119,arp_tpa=185.255.178.1,arp_op=1,arp_sha=fa:16:3e:de:2a:ce,arp_tha=00:00:00:00:00:00
2025-04-01T01:05:51.377Z|00017|ofproto_dpif_xlate(handler46)|WARN|Dropped
7 log messages in last 61 seconds (most recently, 29 seconds ago) due to
excessive rate
2025-04-01T01:05:51.377Z|00018|ofproto_dpif_xlate(handler46)|WARN|over
4096 resubmit actions on bridge br-int while processing
udp,in_port=473,vlan_tci=0x0000,dl_src=fa:16:3e:66:0c:6e,dl_dst=ff:ff:ff:ff:ff:ff,nw_src=83.217.210.206,nw_dst=255.255.255.255,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=5678,tp_dst=5678
2025-04-01T01:05:51.377Z|00019|ofproto_dpif_upcall(handler46)|WARN|Dropped
8 log messages in last 63 seconds (most recently, 29 seconds ago) due to
excessive rate
2025-04-01T01:05:51.377Z|00020|ofproto_dpif_upcall(handler46)|WARN|Flow:
udp,in_port=278,vlan_tci=0x0000,dl_src=fa:16:3e:66:0c:6e,dl_dst=ff:ff:ff:ff:ff:ff,nw_src=83.217.210.206,nw_dst=255.255.255.255,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=5678,tp_dst=5678
bridge("br-int")
----------------
0. in_port=278, priority 100, cookie 0xd191f4de
set_field:0x5e/0xffff->reg13
set_field:0x2->reg11
set_field:0x3->reg12
set_field:0x2->metadata
set_field:0x33d->reg14
set_field:0/0xffff0000->reg13
resubmit(,8)
8. metadata=0x2, priority 50, cookie 0xecab0c71
set_field:0/0x1000->reg10
resubmit(,73)
73. reg14=0x33d,metadata=0x2, priority 80, cookie 0xd191f4de
set_field:0x1000/0x1000->reg10
move:NXM_NX_REG10[12]->NXM_NX_XXREG0[111]
-> NXM_NX_XXREG0[111] is now 0x1
resubmit(,9)
9. reg0=0x8000/0x8000,metadata=0x2, priority 50, cookie 0x1a158a65
drop
Final flow:
udp,reg0=0x8000,reg10=0x1000,reg11=0x2,reg12=0x3,reg13=0x5e,reg14=0x33d,metadata=0x2,in_port=278,vlan_tci=0x0000,dl_src=fa:16:3e:66:0c:6e,dl_dst=ff:ff:ff:ff:ff:ff,nw_src=83.217.210.206,nw_dst=255.255.255.255,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=5678,tp_dst=5678
Megaflow: recirc_id=0,eth,ip,in_port=278,dl_src=fa:16:3e:66:0c:6e,nw_frag=no
Datapath actions: drop
When i'm seeing this issue on mentioned IPs i can see that VM tries to
resolve default gateway mac address but with no success since ARP is
dropped somewhere because of resubmits and drop action on datapath. I
guess now this is not related to OVN controller. Meanwhile DHCP works
pretty good :)
Option broadcast-arps-to-all-routers is disabled at both of provider
network Logical Switches.
Could there be any solution for this?
Regards,
Ilia Baikov
26.03.2025 12:34, Ilia Baikov пишет:
Got my hands on this, back to debugging. Seems like kernel runs stable
# uname -r
6.14.0-061400-generic
Meanwhile there is no unrecognized(27) related logs.
tail -f /var/log/kolla/openvswitch/ovn-controller.log | grep -i "dhcp"
2025-03-26T09:23:08.086Z|38050|pinctrl(ovn_pinctrl0)|INFO|DHCPACK
fa:16:3e:9c:f4:45 185.255.178.131
2025-03-26T09:23:08.086Z|38052|pinctrl(ovn_pinctrl0)|DBG|pinctrl
received packet-in | opcode=PUT_DHCP_OPTS| OF_Table_ID=0|
OF_Cookie_ID=0xfb6fb11d| in-port=5| src-mac=fa:16:3e:9c:f4:45,
dst-mac=ff:ff:ff:ff:ff:ff| src-ip=0.0.0.0, dst-ip=255.255.255.255
2025-03-26T09:23:11.084Z|38054|pinctrl(ovn_pinctrl0)|INFO|DHCPACK
fa:16:3e:9c:f4:45 185.255.178.131
2025-03-26T09:23:11.085Z|38056|pinctrl(ovn_pinctrl0)|DBG|pinctrl
received packet-in | opcode=PUT_DHCP_OPTS| OF_Table_ID=0|
OF_Cookie_ID=0xfb6fb11d| in-port=5| src-mac=fa:16:3e:9c:f4:45,
dst-mac=ff:ff:ff:ff:ff:ff| src-ip=0.0.0.0, dst-ip=255.255.255.255
2025-03-26T09:23:26.606Z|38058|pinctrl(ovn_pinctrl0)|INFO|DHCPACK
fa:16:3e:9c:f4:45 185.255.178.131
2025-03-26T09:23:26.606Z|38060|pinctrl(ovn_pinctrl0)|DBG|pinctrl
received packet-in | opcode=PUT_DHCP_OPTS| OF_Table_ID=0|
OF_Cookie_ID=0xfb6fb11d| in-port=5| src-mac=fa:16:3e:9c:f4:45,
dst-mac=ff:ff:ff:ff:ff:ff| src-ip=0.0.0.0, dst-ip=255.255.255.255
2025-03-26T09:23:27.704Z|38062|pinctrl(ovn_pinctrl0)|INFO|DHCPACK
fa:16:3e:9c:f4:45 185.255.178.131
2025-03-26T09:23:27.704Z|38064|pinctrl(ovn_pinctrl0)|DBG|pinctrl
received packet-in | opcode=PUT_DHCP_OPTS| OF_Table_ID=0|
OF_Cookie_ID=0xfb6fb11d| in-port=5| src-mac=fa:16:3e:9c:f4:45,
dst-mac=ff:ff:ff:ff:ff:ff| src-ip=0.0.0.0, dst-ip=255.255.255.255
2025-03-26T09:23:28.383Z|38066|pinctrl(ovn_pinctrl0)|INFO|DHCPACK
fa:16:3e:9c:f4:45 185.255.178.131
2025-03-26T09:23:28.383Z|38068|pinctrl(ovn_pinctrl0)|DBG|pinctrl
received packet-in | opcode=PUT_DHCP_OPTS| OF_Table_ID=0|
OF_Cookie_ID=0xfb6fb11d| in-port=5| src-mac=fa:16:3e:9c:f4:45,
dst-mac=ff:ff:ff:ff:ff:ff| src-ip=0.0.0.0, dst-ip=255.255.255.255
2025-03-26T09:23:53.984Z|38070|pinctrl(ovn_pinctrl0)|INFO|DHCPACK
fa:16:3e:50:22:c4 185.255.178.170
2025-03-26T09:23:53.984Z|38072|pinctrl(ovn_pinctrl0)|DBG|pinctrl
received packet-in | opcode=PUT_DHCP_OPTS| OF_Table_ID=0|
OF_Cookie_ID=0xa020594| in-port=184| src-mac=fa:16:3e:50:22:c4,
dst-mac=30:b6:4f:5f:db:a0| src-ip=185.255.178.170, dst-ip=185.255.178.1
2025-03-26T09:24:51.866Z|38074|pinctrl(ovn_pinctrl0)|INFO|DHCPACK
fa:16:3e:18:ac:4a 89.169.15.224
2025-03-26T09:24:51.866Z|38076|pinctrl(ovn_pinctrl0)|DBG|pinctrl
received packet-in | opcode=PUT_DHCP_OPTS| OF_Table_ID=0|
OF_Cookie_ID=0xa020594| in-port=156| src-mac=fa:16:3e:18:ac:4a,
dst-mac=30:b6:4f:5f:db:a0| src-ip=89.169.15.224, dst-ip=89.169.15.1
And yes, there is logs about resubmit actions which are expected as
you said.
->reg15
continuation.actions=unroll_xlate(table=0, cookie=0),resubmit(,32)
continuation.actions=unroll_xlate(table=0, cookie=0),resubmit(,32)
OFPT_FLOW_MOD (OF1.5) (xid=0x235760): ADD table:8
priority=110,icmp6,reg10=0x10000/0x10000,reg15=0x28a,metadata=0x2,dl_src=fa:16:3e:48:61:ad,icmp_type=2,icmp_code=0
cookie:0xd1588295
actions=push:NXM_NX_REG14[],push:NXM_NX_REG15[],pop:NXM_NX_REG14[],pop:NXM_NX_REG15[],resubmit(,9)
OFPT_FLOW_MOD (OF1.5) (xid=0x235761): ADD table:8
priority=110,icmp,reg10=0x10000/0x10000,reg15=0x28a,metadata=0x2,dl_src=fa:16:3e:48:61:ad,icmp_type=3,icmp_code=4
cookie:0xd1588295
actions=push:NXM_NX_REG14[],push:NXM_NX_REG15[],pop:NXM_NX_REG14[],pop:NXM_NX_REG15[],resubmit(,9)
OFPT_FLOW_MOD (OF1.5) (xid=0x23577b): ADD table:80
priority=100,reg14=0x28a,metadata=0x2 cookie:0xa17d67d7
actions=set_field:0x9->reg11,set_field:0xa->reg12,resubmit(,8)
OFPT_FLOW_MOD (OF1.5) (xid=0x23577c): ADD table:43
priority=100,reg15=0x28a,metadata=0x2 cookie:0xa17d67d7
actions=set_field:0x1->reg15,resubmit(,43
OFPT_PACKET_OUT (OF1.5) (xid=0x235622): in_port=CONTROLLER
actions=set_field:0x2->metadata,set_field:0x503->reg14,resubmit(CONTROLLER,8)
data_len=42
OFPT_PACKET_OUT (OF1.5) (xid=0x235622): in_port=CONTROLLER
actions=set_field:0x2->metadata,set_field:0x503->reg14,resubmit(CONTROLLER,8)
data_len=42
continuation.actions=unroll_xlate(table=0, cookie=0),resubmit(,32)
continuation.actions=unroll_xlate(table=0, cookie=0),resubmit(,32)
continuation.actions=unroll_xlate(table=0, cookie=0),resubmit(,32)
continuation.actions=unroll_xlate(table=0, cookie=0),resubmit(,32)
top - 09:29:33 up 16:58, 3 users, load average: 37.70, 37.39, 36.95
Threads: 114 total, 13 running, 101 sleeping, 0 stopped, 0 zombie
%Cpu(s): 27.5 us, 11.2 sy, 0.0 ni, 60.3 id, 0.5 wa, 0.0 hi, 0.6
si, 0.0 st
MiB Mem : 773901.8 total, 323685.3 free, 423865.0 used, 26351.4
buff/cache
MiB Swap: 8192.0 total, 8192.0 free, 0.0 used. 345358.7 avail
Mem
#top -H -p $(pidof ovs-vswitchd)
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
COMMAND
6212 root 20 0 8388104 662016 7940 R 22.3 0.1 32:13.87
revalidator102
6200 root 20 0 8388104 662016 7940 R 21.6 0.1 73:01.76
revalidator90
6202 root 20 0 8388104 662016 7940 R 20.6 0.1 32:13.77
revalidator92
6217 root 20 0 8388104 662016 7940 R 20.3 0.1 26:07.50
revalidator107
6214 root 20 0 8388104 662016 7940 R 18.9 0.1 39:12.60
revalidator104
6219 root 20 0 8388104 662016 7940 R 18.6 0.1 25:57.96
revalidator109
6211 root 20 0 8388104 662016 7940 R 17.6 0.1 39:20.20
revalidator101
6207 root 20 0 8388104 662016 7940 R 13.6 0.1 17:53.31
revalidator97
6204 root 20 0 8388104 662016 7940 R 13.0 0.1 32:34.67
revalidator94
6209 root 20 0 8388104 662016 7940 R 12.6 0.1 35:55.88
revalidator100
6213 root 20 0 8388104 662016 7940 R 12.6 0.1 18:04.43
revalidator103
6220 root 20 0 8388104 662016 7940 R 12.3 0.1 5:52.71
revalidator110
6218 root 20 0 8388104 662016 7940 R 12.0 0.1 9:09.66
revalidator108
6215 root 20 0 8388104 662016 7940 S 8.6 0.1 22:22.27
revalidator105
6221 root 20 0 8388104 662016 7940 S 8.6 0.1 4:44.93
revalidator111
6208 root 20 0 8388104 662016 7940 S 8.0 0.1 22:55.17
revalidator98
6203 root 20 0 8388104 662016 7940 S 6.0 0.1 38:52.72
revalidator93
6206 root 20 0 8388104 662016 7940 S 6.0 0.1 22:11.55
revalidator96
6216 root 20 0 8388104 662016 7940 S 4.3 0.1 33:14.52
revalidator106
Sometimes revalidator processes drops to 5-10%, sometimes to 30%. I
guess this behaviour is because of resubmit actions?. So far so good
controller feeling fine, but there could be some sort of freezes on
sending DHCPREPLY.
top -H -p $(pidof ovn-controller)
top - 09:30:55 up 16:59, 3 users, load average: 35.73, 36.70, 36.73
Threads: 5 total, 0 running, 5 sleeping, 0 stopped, 0 zombie
%Cpu(s): 27.7 us, 10.6 sy, 0.0 ni, 60.8 id, 0.3 wa, 0.0 hi, 0.6
si, 0.0 st
MiB Mem : 773901.8 total, 323758.4 free, 423785.6 used, 26357.8
buff/cache
MiB Swap: 8192.0 total, 8192.0 free, 0.0 used. 345438.0 avail
Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
COMMAND
5045 root 20 0 396500 83836 4432 S 0.7 0.0 11:46.59
ovn-controller
5412 root 20 0 396500 83836 4432 S 0.0 0.0 0:06.23
ovn_pinctrl0
5413 root 20 0 396500 83836 4432 S 0.0 0.0 0:00.00
urcu1
5414 root 20 0 396500 83836 4432 S 0.0 0.0 0:00.20
ovn_statctrl2
6103 root 20 0 396500 83836 4432 S 0.0 0.0 0:04.10
stopwatch3
Trying to reproduce on a real-world environment. There is 300
instances running with about 300Mbps network traffic in total.
Is there more logs or debug i can provide?
_______________________________________________
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss