Switched back to configuration with no neutron-dhcp-agent. I have synced ovn db with neutron-ovn-db-sync-util.

There is coverage/show result of node where i'm seeing this issue right now (current network bandwidth is about 300Mbps).
# ovs-appctl coverage/show
Event coverage, avg rate over last: 5 seconds, last minute, last hour,  hash=982b7b40:
nln_changed                0.0/sec     0.000/sec 0.0197/sec   total: 8568
route_table_dump           0.0/sec     0.000/sec 0.0072/sec   total: 3953
netlink_sent             1158.4/sec  1109.700/sec 1018.6758/sec   total: 840427925 netlink_recv_jumbo       144.2/sec   140.883/sec 212.7489/sec   total: 127752548 netlink_received         1258.4/sec  1229.533/sec 1192.0311/sec   total: 1072129295
netdev_set_ethtool         0.0/sec     0.000/sec 0.0006/sec   total: 361
netdev_get_ethtool         0.0/sec     0.000/sec 0.0039/sec   total: 2808
netdev_get_ifindex         0.0/sec     0.000/sec 0.3358/sec   total: 14218
netdev_set_policing        0.0/sec     0.000/sec 0.0028/sec   total: 1685
vconn_sent               596.2/sec   576.967/sec 585.4778/sec   total: 643013987 vconn_received           320.0/sec   310.883/sec 307.5108/sec   total: 360905481 util_xalloc              937354.8/sec 912615.150/sec 873946.4494/sec   total: 1068314545738
unixctl_replied            0.0/sec     0.033/sec 0.0331/sec   total: 33146
unixctl_received           0.0/sec     0.033/sec 0.0331/sec   total: 33146
long_poll_interval         0.0/sec     0.000/sec 0.0000/sec   total: 1
stream_open                0.0/sec     0.000/sec 0.0000/sec   total: 2
pstream_open               0.0/sec     0.000/sec 0.0000/sec   total: 5
seq_change               3283.6/sec  3255.167/sec 3039.7792/sec   total: 3088392125 rconn_sent               596.2/sec   576.967/sec 585.4767/sec   total: 643013979 rconn_queued             596.2/sec   576.967/sec 585.5044/sec   total: 643014079
rconn_overflow             0.0/sec     0.000/sec 2.0272/sec   total: 16543
rconn_discarded            0.0/sec     0.000/sec 0.0278/sec   total: 100
poll_zero_timeout        275.2/sec   247.600/sec 161.9242/sec   total: 136657138 poll_create_node         4749.6/sec  4417.217/sec 4062.3500/sec   total: 4007153255
txn_success                0.2/sec     0.183/sec 0.1828/sec   total: 184805
txn_incomplete             0.4/sec     0.367/sec 0.4856/sec   total: 372203
txn_unchanged              0.0/sec     0.000/sec 0.1169/sec   total: 7146
netdev_get_stats          77.6/sec    71.133/sec 70.5939/sec   total: 73986502
mac_learning_moved         0.0/sec     0.000/sec 0.0000/sec   total: 425
mac_learning_expired       0.6/sec     0.217/sec 0.4439/sec   total: 409542
mac_learning_learned       0.2/sec     0.233/sec 0.4403/sec   total: 410444
hmap_expand              298.0/sec   265.833/sec 329.0514/sec   total: 291663955
hmap_pathological         10.4/sec     6.050/sec 9.1406/sec   total: 8189560
hindex_expand              0.0/sec     0.000/sec 0.0000/sec   total: 12
hindex_pathological        0.0/sec     0.000/sec 0.0000/sec   total: 148
miniflow_malloc            0.0/sec     0.000/sec 21.7272/sec   total: 3395676 flow_extract             843.4/sec   801.333/sec 727.9828/sec   total: 755385765
dpif_port_del              0.0/sec     0.000/sec 0.0006/sec   total: 379
dpif_port_add              0.0/sec     0.000/sec 0.0006/sec   total: 367
dpif_flow_put_error        6.0/sec     7.633/sec 5.1642/sec   total: 4420389
dpif_flow_put            232.2/sec   208.867/sec 126.8286/sec   total: 98257942
dpif_flow_get              0.0/sec     0.000/sec 0.0000/sec   total: 386
dpif_flow_flush            0.0/sec     0.000/sec 0.0000/sec   total: 1
dpif_flow_del            182.2/sec   203.100/sec 118.5047/sec   total: 93717525
dpif_execute_with_help     0.0/sec     0.000/sec 0.0017/sec   total: 1649
dpif_execute_error         0.0/sec     0.000/sec 0.0006/sec   total: 398
dpif_execute             571.8/sec   537.350/sec 439.8069/sec   total: 472799472
cmap_shrink                0.8/sec     0.733/sec 1.4025/sec   total: 1180907
cmap_expand                1.0/sec     0.583/sec 1.5000/sec   total: 1182882
ccmap_shrink               0.0/sec     0.000/sec 0.1786/sec   total: 747
ccmap_expand               0.0/sec     0.000/sec 0.1200/sec   total: 774
connmgr_async_unsent       0.0/sec     0.000/sec 0.0372/sec   total: 139
xlate_actions            4171.0/sec  3071.683/sec 3133.7436/sec   total: 3107954003
upcall_ukey_replace        0.0/sec     0.117/sec 0.1739/sec   total: 155715
upcall_ukey_contention     0.0/sec     0.000/sec 0.0003/sec   total: 157
upcall_flow_limit_grew     0.0/sec     0.550/sec 0.2328/sec   total: 56359
ukey_replace_contention    0.0/sec     0.017/sec 0.0008/sec   total: 847
revalidate_missed_dp_flow   0.0/sec     0.117/sec 0.0472/sec   total: 106876
handler_duplicate_upcall  14.0/sec    12.817/sec 11.5303/sec   total: 9317449
dumped_new_flow            0.0/sec     0.000/sec 0.0000/sec   total: 842
dumped_duplicate_flow      0.0/sec     0.000/sec 0.0003/sec   total: 37029
rev_mac_learning           0.8/sec     0.417/sec 0.6419/sec   total: 708403
rev_flow_table             0.0/sec     0.000/sec 0.0033/sec   total: 4003
rev_port_toggled           0.0/sec     0.000/sec 0.0000/sec   total: 119
rev_reconfigure            0.0/sec     0.000/sec 0.0008/sec   total: 691
ofproto_update_port        0.0/sec     0.000/sec 0.0039/sec   total: 2430
ofproto_recv_openflow    320.0/sec   310.883/sec 307.5097/sec   total: 360905473 ofproto_packet_out       319.6/sec   310.633/sec 299.3358/sec   total: 360555564
ofproto_flush              0.0/sec     0.000/sec 0.0000/sec   total: 1
bridge_reconfigure         0.0/sec     0.000/sec 0.1125/sec   total: 5108
100 events never hit

> Do you see any warning/error logs in ovs-vswitchd.log?
I see no noticable errors except usual warn logs like invalid packet.

> Just double checking, you're saying that with low numbers of instances
> you don't have this issue?
Exactly. With relatively low instances count there is no visible issue.

> I wonder if you're not potentially hitting the maximum resubmit limit
> (4096 resubmits) due to DHCPOFFERS being sent as broadcast. There
> should be warning logs indicating that in ovs-vswitchd.

There is no error or warn logs mentioning resubmit, but a lot of such logs:

2025-03-01T11:27:16.473Z|19884389|vconn|DBG|unix#33063: received: OFPT_PACKET_OUT (OF1.5) (xid=0xe3465): in_port=CONTROLLER actions=set_field:0x300->reg0,set_field:0x9->reg11,set_field:0xa->reg12,set_field:0xe0->reg13,set_field:0x1->reg14,set_field:0x291->reg15,set_field:0x2->metadata,set_field:0->pkt_mark,set_field:0x9002->reg6,set_field:0x8000->reg15,resubmit(CONTROLLER,43) data_len=60 2025-03-01T11:27:16.972Z|19884399|vconn|DBG|unix#33063: received: OFPT_PACKET_OUT (OF1.5) (xid=0xe358a): in_port=CONTROLLER actions=set_field:0x300->reg0,set_field:0x9->reg11,set_field:0xa->reg12,set_field:0x93->reg13,set_field:0x1->reg14,set_field:0xdd->reg15,set_field:0x2->metadata,set_field:0->pkt_mark,set_field:0x9001->reg6,set_field:0x8000->reg15,resubmit(CONTROLLER,43) data_len=60 2025-03-01T11:27:17.873Z|19884417|vconn|DBG|unix#33063: received: OFPT_PACKET_OUT (OF1.5) (xid=0xe3765): in_port=CONTROLLER actions=set_field:0x300->reg0,set_field:0x9->reg11,set_field:0xa->reg12,set_field:0x93->reg13,set_field:0x1->reg14,set_field:0xdd->reg15,set_field:0x2->metadata,set_field:0->pkt_mark,set_field:0x9001->reg6,set_field:0x8000->reg15,resubmit(CONTROLLER,43) data_len=60 2025-03-01T11:27:17.972Z|19884419|vconn|DBG|unix#33063: received: OFPT_PACKET_OUT (OF1.5) (xid=0xe37ba): in_port=CONTROLLER actions=set_field:0x300->reg0,set_field:0x9->reg11,set_field:0xa->reg12,set_field:0xe0->reg13,set_field:0x1->reg14,set_field:0x291->reg15,set_field:0x2->metadata,set_field:0->pkt_mark,set_field:0x9002->reg6,set_field:0x8000->reg15,resubmit(CONTROLLER,43) data_len=60

I've checked presence of error/warn logs at ovs-vswitchd using next command
# cat /var/log/kolla/openvswitch/ovs-vswitchd.log | grep -i "resubmit" | grep -iE "warn|err

Also i have non-deployed region running 2024.2 OpenStack and did some dumb tests like creating 1500 VMs with about 500 instances per hypervisor and didn't see this issue. However, i didn't use L2 network in this case, only virtual network like geneve/vxlan. I will try today to create sample L2 vlan with mac learning enabled at the border router and report back. I think this bug will not occur if there is no a lot of packets coming through ovs, handling DHCP with no issues...

24.02.2025 16:00, Dumitru Ceara пишет:
On 2/20/25 11:13 PM, Ilia Baikov via discuss wrote:
Hi

Hi Ilia,

+ Ilya Maximets for more ovs insights.

Over a lot of observes and testing various number of possible fixes came
with nothing...

Yes, there are messages that contains NXT_RESUME substring.

ovn-controller:
2025-02-18T21:05:51.771Z|1041129|vconn(ovn_pinctrl0)|DBG|unix:/var/run/
openvswitch/br-int.mgmt: sent (Success): NXT_RESUME (OF1.5) (xid=0x0):
cookie=0x404b44bc total_len=348
reg0=0x308,reg11=0x9,reg12=0xa,reg13=0x13c,reg14=0x4a8,metadata=0x2,in_port=312 
(via action) data_len=348 (unbuffered)
2025-02-18T21:08:23.729Z|1044255|vconn(ovn_pinctrl0)|DBG|unix:/var/run/
openvswitch/br-int.mgmt: sent (Success): NXT_RESUME (OF1.5) (xid=0x0):
cookie=0xffe9d519 total_len=348
reg0=0x308,reg11=0x9,reg12=0xa,reg13=0x112,reg14=0x2dd,metadata=0x2,in_port=363 
(via action) data_len=348 (unbuffered)
2025-02-18T21:10:33.865Z|1046917|vconn(ovn_pinctrl0)|DBG|unix:/var/run/
openvswitch/br-int.mgmt: sent (Success): NXT_RESUME (OF1.5) (xid=0x0):
cookie=0x5bc96bd6 total_len=348
reg0=0x28b,reg11=0x9,reg12=0xa,reg13=0x57,reg14=0x423,metadata=0x2,in_port=83 
(via action) data_len=348 (unbuffered)
2025-02-18T21:11:06.733Z|1047596|vconn(ovn_pinctrl0)|DBG|unix:/var/run/
openvswitch/br-int.mgmt: sent (Success): NXT_RESUME (OF1.5) (xid=0x0):
cookie=0x5bc96bd6 total_len=348
reg0=0x28b,reg11=0x9,reg12=0xa,reg13=0x5f,reg14=0x413,metadata=0x2,in_port=91 
(via action) data_len=348 (unbuffered)
2025-02-18T21:12:16.984Z|1048995|vconn(ovn_pinctrl0)|DBG|unix:/var/run/
openvswitch/br-int.mgmt: sent (Success): NXT_RESUME (OF1.5) (xid=0x0):
cookie=0xfded3de0 total_len=348
reg0=0x308,reg11=0x9,reg12=0xa,reg13=0xc5,reg14=0x354,metadata=0x2,in_port=373 
(via action) data_len=348 (unbuffered)
2025-02-18T21:13:03.396Z|1049934|vconn(ovn_pinctrl0)|DBG|unix:/var/run/
openvswitch/br-int.mgmt: sent (Success): NXT_RESUME (OF1.5) (xid=0x0):
cookie=0xfded3de0 total_len=348
reg0=0x308,reg11=0x9,reg12=0xa,reg13=0xc5,reg14=0x354,metadata=0x2,in_port=373 
(via action) data_len=348 (unbuffered)
2025-02-18T21:19:48.641Z|1058227|vconn(ovn_pinctrl0)|DBG|unix:/var/run/
openvswitch/br-int.mgmt: sent (Success): NXT_RESUME (OF1.5) (xid=0x0):
cookie=0xbbfea837 total_len=348
reg0=0x308,reg11=0x9,reg12=0xa,reg13=0x132,reg14=0x434,metadata=0x2,in_port=302 
(via action) data_len=348 (unbuffered)
2025-02-18T21:19:50.069Z|1058255|vconn(ovn_pinctrl0)|DBG|unix:/var/run/
openvswitch/br-int.mgmt: sent (Success): NXT_RESUME (OF1.5) (xid=0x0):
cookie=0x5bc96bd6 total_len=348
reg0=0x28b,reg11=0x9,reg12=0xa,reg13=0x6f,reg14=0x148,metadata=0x2,in_port=107 
(via action) data_len=348 (unbuffered)
2025-02-18T21:21:38.224Z|1060474|vconn(ovn_pinctrl0)|DBG|unix:/var/run/
openvswitch/br-int.mgmt: sent (Success): NXT_RESUME (OF1.5) (xid=0x0):
cookie=0x5bc96bd6 total_len=348
reg0=0x28b,reg11=0x9,reg12=0xa,reg13=0xdb,reg14=0x336,metadata=0x2,in_port=215 
(via action) data_len=348 (unbuffered)
2025-02-18T21:21:42.735Z|1060568|vconn(ovn_pinctrl0)|DBG|unix:/var/run/
openvswitch/br-int.mgmt: sent (Success): NXT_RESUME (OF1.5) (xid=0x0):
cookie=0xffe9d519 total_len=348
reg0=0x308,reg11=0x9,reg12=0xa,reg13=0x112,reg14=0x2dd,metadata=0x2,in_port=363 
(via action) data_len=348 (unbuffered)

ovs-vswitchd
2025-02-18T20:55:45.673Z|1086321|vconn|DBG|unix#0: received: NXT_RESUME
(OF1.5) (xid=0x0): cookie=0x58482a66 total_len=348
reg0=0x308,reg11=0x9,reg12=0xa,reg13=0x151,reg14=0x4b1,metadata=0x2,in_port=375 
(via action) data_len=348 (unbuffered)
2025-02-18T21:01:05.872Z|1092877|vconn|DBG|unix#0: received: NXT_RESUME
(OF1.5) (xid=0x0): cookie=0x983e6272 total_len=348
reg0=0x308,reg11=0x9,reg12=0xa,reg13=0x15f,reg14=0x4d3,metadata=0x2,in_port=368 
(via action) data_len=348 (unbuffered)
2025-02-18T21:01:23.672Z|1093239|vconn|DBG|unix#0: received: NXT_RESUME
(OF1.5) (xid=0x0): cookie=0xfded3de0 total_len=348
reg0=0x308,reg11=0x9,reg12=0xa,reg13=0xc5,reg14=0x354,metadata=0x2,in_port=373 
(via action) data_len=348 (unbuffered)
2025-02-18T21:04:03.272Z|1096513|vconn|DBG|unix#0: received: NXT_RESUME
(OF1.5) (xid=0x0): cookie=0xfded3de0 total_len=348
reg0=0x308,reg11=0x9,reg12=0xa,reg13=0xc5,reg14=0x354,metadata=0x2,in_port=373 
(via action) data_len=348 (unbuffered)
2025-02-18T21:07:36.773Z|1100887|vconn|DBG|unix#0: received: NXT_RESUME
(OF1.5) (xid=0x0): cookie=0xfded3de0 total_len=348
reg0=0x308,reg11=0x9,reg12=0xa,reg13=0xc5,reg14=0x354,metadata=0x2,in_port=373 
(via action) data_len=348 (unbuffered)
2025-02-18T21:07:51.472Z|1101187|vconn|DBG|unix#0: received: NXT_RESUME
(OF1.5) (xid=0x0): cookie=0xa6d16298 total_len=348
reg0=0x308,reg11=0x9,reg12=0xa,reg13=0x4b,reg14=0x1bb,metadata=0x2,in_port=349 
(via action) data_len=348 (unbuffered)
2025-02-18T21:08:08.673Z|1101537|vconn|DBG|unix#0: received: NXT_RESUME
(OF1.5) (xid=0x0): cookie=0x8b831d17 total_len=348
reg0=0x308,reg11=0x9,reg12=0xa,reg13=0x26,reg14=0x123,metadata=0x2,in_port=34 
(via action) data_len=348 (unbuffered)
2025-02-18T21:09:09.173Z|1102776|vconn|DBG|unix#0: received: NXT_RESUME
(OF1.5) (xid=0x0): cookie=0xbf424a23 total_len=348
reg0=0x308,reg11=0x9,reg12=0xa,reg13=0x85,reg14=0x2d9,metadata=0x2,in_port=371 
(via action) data_len=348 (unbuffered)
2025-02-18T21:13:03.574Z|1108046|vconn|DBG|unix#0: received: NXT_RESUME
(OF1.5) (xid=0x0): cookie=0xfded3de0 total_len=348
reg0=0x308,reg11=0x9,reg12=0xa,reg13=0xc5,reg14=0x354,metadata=0x2,in_port=373 
(via action) data_len=348 (unbuffered)
2025-02-18T21:17:18.772Z|1113273|vconn|DBG|unix#0: received: NXT_RESUME
(OF1.5) (xid=0x0): cookie=0x5bc96bd6 total_len=348
reg0=0x28b,reg11=0x9,reg12=0xa,reg13=0x7c,reg14=0x241,metadata=0x2,in_port=120 
(via action) data_len=348 (unbuffered)
2025-02-18T21:18:51.673Z|1115178|vconn|DBG|unix#0: received: NXT_RESUME
(OF1.5) (xid=0x0): cookie=0xfded3de0 total_len=348
reg0=0x308,reg11=0x9,reg12=0xa,reg13=0xc5,reg14=0x354,metadata=0x2,in_port=373 
(via action) data_len=348 (unbuffered)
2025-02-18T21:19:36.872Z|1116105|vconn|DBG|unix#0: received: NXT_RESUME
(OF1.5) (xid=0x0): cookie=0x5bc96bd6 total_len=348
reg0=0x28b,reg11=0x9,reg12=0xa,reg13=0x87,reg14=0x4b,metadata=0x2,in_port=131 
(via action) data_len=348 (unbuffered)
2025-02-18T21:19:52.172Z|1116417|vconn|DBG|unix#0: received: NXT_RESUME
(OF1.5) (xid=0x0): cookie=0xfded3de0 total_len=348
reg0=0x308,reg11=0x9,reg12=0xa,reg13=0xc5,reg14=0x354,metadata=0x2,in_port=373 
(via action) data_len=348 (unbuffered)
2025-02-18T21:20:24.572Z|1117078|vconn|DBG|unix#0: received: NXT_RESUME
(OF1.5) (xid=0x0): cookie=0x5bc96bd6 total_len=348
reg0=0x28b,reg11=0x9,reg12=0xa,reg13=0x8b,reg14=0x2fb,metadata=0x2,in_port=135 
(via action) data_len=348 (unbuffered)
2025-02-18T21:21:31.672Z|1118459|vconn|DBG|unix#0: received: NXT_RESUME
(OF1.5) (xid=0x0): cookie=0x5bc96bd6 total_len=348
reg0=0x28b,reg11=0x9,reg12=0xa,reg13=0x93,reg14=0xdd,metadata=0x2,in_port=143 
(via action) data_len=348 (unbuffered)

The interesting logs is also at ovn-controller logs
2025-02-18T21:21:10.152Z|1059900|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER
fa:16:3e:a6:57:54 13.37.13.68
2025-02-18T21:21:10.560Z|1059909|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER
fa:16:3e:43:d4:9a 13.38.13.15
2025-02-18T21:21:14.178Z|1059983|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER
fa:16:3e:cb:77:8d 13.37.13.72
2025-02-18T21:21:16.785Z|1060037|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER
fa:16:3e:cc:d3:66 13.38.13.62
2025-02-18T21:21:19.661Z|1060097|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER
fa:16:3e:7e:1b:a8 13.37.13.173
2025-02-18T21:21:22.582Z|1060157|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER
fa:16:3e:cc:d3:66 13.38.13.62
2025-02-18T21:21:25.903Z|1060225|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER
fa:16:3e:f6:ae:8b 13.37.13.11
2025-02-18T21:21:28.643Z|1060281|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER
fa:16:3e:b0:98:e0 13.37.13.101
2025-02-18T21:21:32.687Z|1060361|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER
fa:16:3e:82:6e:c3 13.37.13.183
2025-02-18T21:21:34.517Z|1060398|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER
fa:16:3e:cb:08:c7 13.37.13.55
2025-02-18T21:21:38.673Z|1060484|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER
fa:16:3e:f9:5b:25 13.37.13.98
2025-02-18T21:21:40.782Z|1060528|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER
fa:16:3e:cc:d3:66 13.38.13.62
2025-02-18T21:21:44.097Z|1060596|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER
fa:16:3e:95:0c:5e 13.37.13.197
2025-02-18T21:21:46.484Z|1060646|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER
fa:16:3e:cc:d3:66 13.38.13.62
2025-02-18T21:21:50.203Z|1060722|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER
fa:16:3e:92:f8:ba 13.37.13.124
2025-02-18T21:21:52.764Z|1060776|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER
fa:16:3e:ce:b8:1e 13.37.13.126
2025-02-18T21:21:55.503Z|1060832|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER
fa:16:3e:8d:88:f6 13.38.13.63
2025-02-18T21:21:59.520Z|1060914|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER
fa:16:3e:8f:20:53 13.37.13.63
2025-02-18T21:22:01.620Z|1060958|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER
fa:16:3e:8a:7a:47 13.37.13.150
2025-02-18T21:22:04.418Z|1061015|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER
fa:16:3e:70:7e:5f 13.37.13.216
2025-02-18T21:22:07.477Z|1061079|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER
fa:16:3e:2a:7a:ad 13.37.13.50
2025-02-18T21:22:11.719Z|1061165|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER
fa:16:3e:cc:d3:66 13.38.13.62
2025-02-18T21:22:13.798Z|1061208|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER
fa:16:3e:a6:57:54 13.37.13.68
2025-02-18T21:22:16.612Z|1061266|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER
fa:16:3e:81:5f:99 13.37.13.14
2025-02-18T21:22:19.830Z|1061334|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER
fa:16:3e:cc:d3:66 13.38.13.62
2025-02-18T21:22:22.460Z|1061388|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER
fa:16:3e:cc:d3:66 13.38.13.62
2025-02-18T21:22:26.001Z|1061460|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER
fa:16:3e:cc:d3:66 13.38.13.62
2025-02-18T21:22:28.383Z|1061506|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER
fa:16:3e:cc:d3:66 13.38.13.62
2025-02-18T21:22:31.594Z|1061572|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER
fa:16:3e:58:35:35 13.37.13.64
2025-02-18T21:22:35.249Z|1061648|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER
fa:16:3e:cc:d3:66 13.38.13.62
2025-02-18T21:22:37.578Z|1061695|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER
fa:16:3e:82:6e:c3 13.37.13.183
2025-02-18T21:22:40.514Z|1061755|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER
fa:16:3e:cc:d3:66 13.38.13.62
2025-02-18T21:22:43.373Z|1061815|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER
fa:16:3e:58:72:c4 13.37.13.65
2025-02-18T21:22:46.557Z|1061881|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER
fa:16:3e:02:5e:ac 13.37.13.111
2025-02-18T21:22:50.020Z|1061951|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER
fa:16:3e:cc:d3:66 13.38.13.62
2025-02-18T21:22:53.511Z|1062023|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER
fa:16:3e:5f:fd:6f 13.37.13.15
2025-02-18T21:22:55.717Z|1062069|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER
fa:16:3e:cc:d3:66 13.38.13.62
2025-02-18T21:22:58.812Z|1062133|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER
fa:16:3e:8d:88:f6 13.38.13.63
2025-02-18T21:23:01.450Z|1062189|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER
fa:16:3e:9a:da:47 13.37.13.225
2025-02-18T21:23:05.545Z|1062273|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER
fa:16:3e:8a:7a:47 13.37.13.150
2025-02-18T21:23:08.512Z|1062332|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER
fa:16:3e:70:7e:5f 13.37.13.216
2025-02-18T21:23:10.980Z|1062384|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER
fa:16:3e:02:ff:8a 13.37.13.233
2025-02-18T21:23:14.405Z|1062454|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER
fa:16:3e:cc:d3:66 13.38.13.62
2025-02-18T21:23:17.172Z|1062511|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER
fa:16:3e:2e:73:8f 13.37.13.186
2025-02-18T21:23:19.435Z|1062559|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER
fa:16:3e:39:8d:2f 13.37.13.153
2025-02-18T21:23:23.049Z|1062633|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER
fa:16:3e:cc:d3:66 13.38.13.62
2025-02-18T21:23:25.653Z|1062687|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER
fa:16:3e:d4:36:c6 13.37.13.60
2025-02-18T21:23:28.768Z|1062749|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER
fa:16:3e:f1:64:3a 13.37.13.125
2025-02-18T21:23:33.340Z|1062843|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER
fa:16:3e:56:c4:04 13.37.13.219
2025-02-18T21:23:34.400Z|1062865|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER
fa:16:3e:6a:7b:a4 13.37.13.121
2025-02-18T21:23:39.485Z|1062968|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER
fa:16:3e:cc:d3:66 13.38.13.62
2025-02-18T21:23:41.393Z|1063007|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER
fa:16:3e:82:6e:c3 13.37.13.183
2025-02-18T21:23:44.410Z|1063069|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER
fa:16:3e:52:ef:94 13.37.13.182

IPs are sanitized, anyway 13.37.13 belongs to subnet #1 and 13.38.13 to
subnet #2. Observing large regression across all infra, like almost 50%
overall network bandwith loss over time.

Any thoughts guys?

Do you see any warning/error logs in ovs-vswitchd.log?

Context: we run a lot of small instances with 1 x fixed IP with little
amount of external traffic, about 300 per hypervisor, sometimes more,
sometimes less anyway didn't see anything similar with plain ovs and
neutron-openvswitch-agent (openstack).
UPD: Once i manually cleared dhcpv4_options column in all ports seems
like neutron-dhcp-agent (dnsmasq based) started to send dhcpoffers and
most of instances back to normal. However, didn't manage what is the
reason to ovn-controller not send DHCPOFFER to VIF. Seems like a big
point of regression starting from N number of instances or something else.

Just double checking, you're saying that with low numbers of instances
you don't have this issue?

I wonder if you're not potentially hitting the maximum resubmit limit
(4096 resubmits) due to DHCPOFFERS being sent as broadcast.  There
should be warning logs indicating that in ovs-vswitchd.

It might also help if you could share the output of:

ovs-appctl coverage/show

10.02.2025 17:36, Numan Siddique:

On Sat, Feb 8, 2025 at 1:27 PM support via discuss <ovs-
disc...@openvswitch.org> wrote:

     I'm observing for a long time next bug: ovn-controller doesn't
     send DHCPOFFER but it logs that it does. However this VIF doesn't
     get it at all.
     But once ovn-controller restarted it successfully seems like it
     actually send DHCPOFFER and immediately receives DHCPACK from VIF.


When you see this issue,  can you enable vconn and rcon debug logs in
both ovn-controller and ovs-vswitchd ?

ovn-appctl vlog/set rconn:dbg
ovn-appctl vlog/set vconn:dbg

ovs-appctl vlog/set rconn:dbg
ovs-appctl vlog/set rconn:dbg

If ovn-controller replies with a DHCPOFFER, you should see the
NXT_RESUME related logs in both ovn-controller and ovs-vswitchd.

This can help you narrow down the issue.

Thanks
Numan

     |2025-02-08T14:27:57.462Z|00053|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER
     fa:16:3e:80:67:62 13.37.0.5
     2025-02-08T14:28:01.746Z|00054|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER
     fa:16:3e:80:67:62 13.37.0.5
     2025-02-08T14:28:05.582Z|00055|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER
     fa:16:3e:80:67:62 13.37.0.5
     2025-02-08T14:28:13.053Z|00056|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER
     fa:16:3e:80:67:62 13.37.0.5
     2025-02-08T14:28:21.972Z|00057|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER
     fa:16:3e:84:8e:7f 13.37.0.37
     2025-02-08T14:28:25.109Z|00058|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER
     fa:16:3e:84:8e:7f 13.37.0.37
     2025-02-08T14:28:29.362Z|00059|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER
     fa:16:3e:84:8e:7f 13.37.0.37
     2025-02-08T14:28:30.003Z|00060|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER
     fa:16:3e:80:67:62 13.37.0.5
     2025-02-08T14:28:37.820Z|00061|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER
     fa:16:3e:84:8e:7f 13.37.0.37
     2025-02-08T14:28:38.619Z|00002|fatal_signal(ovn_statctrl2)|WARN|
terminating
     with signal 15 (Terminated) # ovn_controller is restarted by
     docker restart ovn_controller
     2025-02-08T14:28:50.639Z|00001|vlog|INFO|opened log file
     /var/log/kolla/openvswitch/ovn-controller.log
     2025-02-08T14:28:50.640Z|00002|vif_plug_representor|INFO|unable
     to set udev receive buffer size: Operation not permitted
     2025-02-08T14:28:50.641Z|00003|reconnect|INFO|tcp:127.0.0.1:6640
     <http://127.0.0.1:6640>: connecting...
     2025-02-08T14:28:50.641Z|00004|reconnect|INFO|tcp:127.0.0.1:6640
     <http://127.0.0.1:6640>: connected
     2025-02-08T14:28:50.738Z|00005|main|INFO|OVN internal version is
     : [24.09.0-20.37.0-73.6]
     2025-02-08T14:28:50.738Z|00006|main|INFO|OVS IDL reconnected,
     force recompute.
     2025-02-08T14:28:50.738Z|00007|reconnect|INFO|tcp:10.10.9.5:6642
     <http://10.10.9.5:6642>: connecting...
     2025-02-08T14:28:50.738Z|00008|main|INFO|OVNSB IDL reconnected,
     force recompute.
     2025-02-08T14:28:50.739Z|00009|ovn_util|INFO|statctrl: connecting
     to switch: "unix:/var/run/openvswitch/br-int.mgmt"
     2025-02-08T14:28:50.739Z|00010|rconn|INFO|unix:/var/run/
openvswitch/br-int.mgmt:
     connecting...
     2025-02-08T14:28:50.739Z|00011|ovn_util|INFO|pinctrl: connecting
     to switch: "unix:/var/run/openvswitch/br-int.mgmt"
     2025-02-08T14:28:50.739Z|00012|rconn|INFO|unix:/var/run/
openvswitch/br-int.mgmt:
     connecting...
     2025-02-08T14:28:50.739Z|00013|reconnect|INFO|tcp:10.10.9.5:6642
     <http://10.10.9.5:6642>: connected
     2025-02-08T14:28:50.739Z|00001|rconn(ovn_statctrl2)|INFO|unix:/
var/run/openvswitch/br-int.mgmt:
     connected
     2025-02-08T14:28:50.743Z|00001|rconn(ovn_pinctrl0)|INFO|unix:/
var/run/openvswitch/br-int.mgmt:
     connected 2025-02-08T14:28:50.756Z|00014|ovn_util|INFO|features:
     connecting to switch: "unix:/var/run/openvswitch/br-int.mgmt"
     2025-02-08T14:28:50.757Z|00015|rconn|INFO|unix:/var/run/
openvswitch/br-int.mgmt:
     connecting... 2025-02-08T14:28:50.757Z|00016|features|INFO|OVS
     Feature: ct_zero_snat, state: supported
     2025-02-08T14:28:50.757Z|00017|features|INFO|OVS Feature:
     ct_flush, state: supported
     2025-02-08T14:28:50.757Z|00018|features|INFO|OVS Feature:
     dp_hash_l4_sym_support, state: supported
     2025-02-08T14:28:50.757Z|00019|main|INFO|OVS feature set changed,
     force recompute.
     2025-02-08T14:28:50.757Z|00020|ovn_util|INFO|ofctrl: connecting
     to switch: "unix:/var/run/openvswitch/br-int.mgmt"
     2025-02-08T14:28:50.757Z|00021|rconn|INFO|unix:/var/run/
openvswitch/br-int.mgmt:
     connecting...
     2025-02-08T14:28:50.758Z|00022|rconn|INFO|unix:/var/run/
openvswitch/br-int.mgmt:
     connected 2025-02-08T14:28:50.758Z|00023|main|INFO|OVS OpenFlow
     connection reconnected,force recompute.
     2025-02-08T14:28:50.759Z|00024|rconn|INFO|unix:/var/run/
openvswitch/br-int.mgmt:
     connected 2025-02-08T14:28:50.760Z|00025|features|INFO|OVS
     Feature: meter_support, state: supported
     2025-02-08T14:28:50.760Z|00026|features|INFO|OVS Feature:
     group_support, state: supported
     2025-02-08T14:28:50.760Z|00027|features|INFO|OVS Feature:
     sample_action_with_registers, state: supported
     2025-02-08T14:28:50.760Z|00028|main|INFO|OVS feature set changed,
     force recompute.
     2025-02-08T14:28:51.483Z|00029|binding|INFO|Releasing lport
     978a923d-1076-484b-a1d5-ff5827ae69ec from this chassis
     (sb_readonly=0)
     2025-02-08T14:28:53.658Z|00030|poll_loop|INFO|wakeup due to
     [POLLIN] on fd 30 (10.10.9.3:60902
     <http://10.10.9.3:60902><->10.10.9.5:6642
     <http://10.10.9.5:6642>) at lib/stream-fd.c:157 (78% CPU usage)
     2025-02-08T14:28:54.726Z|00002|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER
     fa:16:3e:84:8e:7f 13.37.0.37
     2025-02-08T14:28:59.134Z|00003|pinctrl(ovn_pinctrl0)|INFO|DHCPACK
     fa:16:3e:84:8e:7f 13.37.0.37
     2025-02-08T14:29:01.499Z|00004|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER
     fa:16:3e:80:67:62 13.37.0.5
     2025-02-08T14:29:03.245Z|00005|pinctrl(ovn_pinctrl0)|INFO|DHCPACK
     fa:16:3e:84:8e:7f 13.37.0.37
     2025-02-08T14:29:03.455Z|00031|memory|INFO|98744 kB peak resident
     set size after 12.8 seconds
     2025-02-08T14:29:03.455Z|00032|memory|INFO|idl-cells-
OVN_Southbound:168023
     idl-cells-Open_vSwitch:36874 lflow-cache-entries-cache-expr:6494
     lflow-cache-entries-cache-matches:1413 lflow-cache-size-KB:20726
     local_datapath_usage-KB:1 ofctrl_desired_flow_usage-KB:6342
     ofctrl_installed_flow_usage-KB:4695 ofctrl_sb_flow_ref_usage-
KB:2355 |

     Any suggestions?

     Environment:
     OpenStack 2024.2
     Kernel 6.8.0-50-generic #51
     <https://github.com/ovn-org/ovn/issues/51>~22.04.1-Ubuntu SMP
     PREEMPT_DYNAMIC

     |ovs-vsctl (Open vSwitch) 3.4.0 DB Schema 8.7.0 |

     |ovs-ofctl (Open vSwitch) 3.4.0 OpenFlow versions 0x1:0x6 |

     |# ovn-nbctl --version ovn-nbctl 24.09.0 Open vSwitch Library
     3.4.0 DB Schema 7.6.0 |

     |# ovn-sbctl --version ovn-sbctl 24.09.0 Open vSwitch Library
     3.4.0 DB Schema 20.37.0|

Thanks,
Dumitru

_______________________________________________
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to