There is coverage/show result of node where i'm seeing this issue right now (current network bandwidth is about 300Mbps).
# ovs-appctl coverage/showEvent coverage, avg rate over last: 5 seconds, last minute, last hour, hash=982b7b40:
nln_changed 0.0/sec 0.000/sec 0.0197/sec total: 8568 route_table_dump 0.0/sec 0.000/sec 0.0072/sec total: 3953netlink_sent 1158.4/sec 1109.700/sec 1018.6758/sec total: 840427925 netlink_recv_jumbo 144.2/sec 140.883/sec 212.7489/sec total: 127752548 netlink_received 1258.4/sec 1229.533/sec 1192.0311/sec total: 1072129295
netdev_set_ethtool 0.0/sec 0.000/sec 0.0006/sec total: 361 netdev_get_ethtool 0.0/sec 0.000/sec 0.0039/sec total: 2808 netdev_get_ifindex 0.0/sec 0.000/sec 0.3358/sec total: 14218 netdev_set_policing 0.0/sec 0.000/sec 0.0028/sec total: 1685vconn_sent 596.2/sec 576.967/sec 585.4778/sec total: 643013987 vconn_received 320.0/sec 310.883/sec 307.5108/sec total: 360905481 util_xalloc 937354.8/sec 912615.150/sec 873946.4494/sec total: 1068314545738
unixctl_replied 0.0/sec 0.033/sec 0.0331/sec total: 33146 unixctl_received 0.0/sec 0.033/sec 0.0331/sec total: 33146 long_poll_interval 0.0/sec 0.000/sec 0.0000/sec total: 1 stream_open 0.0/sec 0.000/sec 0.0000/sec total: 2 pstream_open 0.0/sec 0.000/sec 0.0000/sec total: 5seq_change 3283.6/sec 3255.167/sec 3039.7792/sec total: 3088392125 rconn_sent 596.2/sec 576.967/sec 585.4767/sec total: 643013979 rconn_queued 596.2/sec 576.967/sec 585.5044/sec total: 643014079
rconn_overflow 0.0/sec 0.000/sec 2.0272/sec total: 16543 rconn_discarded 0.0/sec 0.000/sec 0.0278/sec total: 100poll_zero_timeout 275.2/sec 247.600/sec 161.9242/sec total: 136657138 poll_create_node 4749.6/sec 4417.217/sec 4062.3500/sec total: 4007153255
txn_success 0.2/sec 0.183/sec 0.1828/sec total: 184805 txn_incomplete 0.4/sec 0.367/sec 0.4856/sec total: 372203 txn_unchanged 0.0/sec 0.000/sec 0.1169/sec total: 7146netdev_get_stats 77.6/sec 71.133/sec 70.5939/sec total: 73986502
mac_learning_moved 0.0/sec 0.000/sec 0.0000/sec total: 425 mac_learning_expired 0.6/sec 0.217/sec 0.4439/sec total: 409542 mac_learning_learned 0.2/sec 0.233/sec 0.4403/sec total: 410444hmap_expand 298.0/sec 265.833/sec 329.0514/sec total: 291663955
hmap_pathological 10.4/sec 6.050/sec 9.1406/sec total: 8189560 hindex_expand 0.0/sec 0.000/sec 0.0000/sec total: 12 hindex_pathological 0.0/sec 0.000/sec 0.0000/sec total: 148miniflow_malloc 0.0/sec 0.000/sec 21.7272/sec total: 3395676 flow_extract 843.4/sec 801.333/sec 727.9828/sec total: 755385765
dpif_port_del 0.0/sec 0.000/sec 0.0006/sec total: 379 dpif_port_add 0.0/sec 0.000/sec 0.0006/sec total: 367 dpif_flow_put_error 6.0/sec 7.633/sec 5.1642/sec total: 4420389dpif_flow_put 232.2/sec 208.867/sec 126.8286/sec total: 98257942
dpif_flow_get 0.0/sec 0.000/sec 0.0000/sec total: 386 dpif_flow_flush 0.0/sec 0.000/sec 0.0000/sec total: 1dpif_flow_del 182.2/sec 203.100/sec 118.5047/sec total: 93717525
dpif_execute_with_help 0.0/sec 0.000/sec 0.0017/sec total: 1649 dpif_execute_error 0.0/sec 0.000/sec 0.0006/sec total: 398dpif_execute 571.8/sec 537.350/sec 439.8069/sec total: 472799472
cmap_shrink 0.8/sec 0.733/sec 1.4025/sec total: 1180907 cmap_expand 1.0/sec 0.583/sec 1.5000/sec total: 1182882 ccmap_shrink 0.0/sec 0.000/sec 0.1786/sec total: 747 ccmap_expand 0.0/sec 0.000/sec 0.1200/sec total: 774 connmgr_async_unsent 0.0/sec 0.000/sec 0.0372/sec total: 139xlate_actions 4171.0/sec 3071.683/sec 3133.7436/sec total: 3107954003
upcall_ukey_replace 0.0/sec 0.117/sec 0.1739/sec total: 155715 upcall_ukey_contention 0.0/sec 0.000/sec 0.0003/sec total: 157 upcall_flow_limit_grew 0.0/sec 0.550/sec 0.2328/sec total: 56359 ukey_replace_contention 0.0/sec 0.017/sec 0.0008/sec total: 847 revalidate_missed_dp_flow 0.0/sec 0.117/sec 0.0472/sec total: 106876handler_duplicate_upcall 14.0/sec 12.817/sec 11.5303/sec total: 9317449
dumped_new_flow 0.0/sec 0.000/sec 0.0000/sec total: 842 dumped_duplicate_flow 0.0/sec 0.000/sec 0.0003/sec total: 37029 rev_mac_learning 0.8/sec 0.417/sec 0.6419/sec total: 708403 rev_flow_table 0.0/sec 0.000/sec 0.0033/sec total: 4003 rev_port_toggled 0.0/sec 0.000/sec 0.0000/sec total: 119 rev_reconfigure 0.0/sec 0.000/sec 0.0008/sec total: 691 ofproto_update_port 0.0/sec 0.000/sec 0.0039/sec total: 2430ofproto_recv_openflow 320.0/sec 310.883/sec 307.5097/sec total: 360905473 ofproto_packet_out 319.6/sec 310.633/sec 299.3358/sec total: 360555564
ofproto_flush 0.0/sec 0.000/sec 0.0000/sec total: 1 bridge_reconfigure 0.0/sec 0.000/sec 0.1125/sec total: 5108 100 events never hit
> Do you see any warning/error logs in ovs-vswitchd.log? I see no noticable errors except usual warn logs like invalid packet. > Just double checking, you're saying that with low numbers of instances > you don't have this issue? Exactly. With relatively low instances count there is no visible issue. > I wonder if you're not potentially hitting the maximum resubmit limit > (4096 resubmits) due to DHCPOFFERS being sent as broadcast. There > should be warning logs indicating that in ovs-vswitchd. There is no error or warn logs mentioning resubmit, but a lot of such logs:2025-03-01T11:27:16.473Z|19884389|vconn|DBG|unix#33063: received: OFPT_PACKET_OUT (OF1.5) (xid=0xe3465): in_port=CONTROLLER actions=set_field:0x300->reg0,set_field:0x9->reg11,set_field:0xa->reg12,set_field:0xe0->reg13,set_field:0x1->reg14,set_field:0x291->reg15,set_field:0x2->metadata,set_field:0->pkt_mark,set_field:0x9002->reg6,set_field:0x8000->reg15,resubmit(CONTROLLER,43) data_len=60 2025-03-01T11:27:16.972Z|19884399|vconn|DBG|unix#33063: received: OFPT_PACKET_OUT (OF1.5) (xid=0xe358a): in_port=CONTROLLER actions=set_field:0x300->reg0,set_field:0x9->reg11,set_field:0xa->reg12,set_field:0x93->reg13,set_field:0x1->reg14,set_field:0xdd->reg15,set_field:0x2->metadata,set_field:0->pkt_mark,set_field:0x9001->reg6,set_field:0x8000->reg15,resubmit(CONTROLLER,43) data_len=60 2025-03-01T11:27:17.873Z|19884417|vconn|DBG|unix#33063: received: OFPT_PACKET_OUT (OF1.5) (xid=0xe3765): in_port=CONTROLLER actions=set_field:0x300->reg0,set_field:0x9->reg11,set_field:0xa->reg12,set_field:0x93->reg13,set_field:0x1->reg14,set_field:0xdd->reg15,set_field:0x2->metadata,set_field:0->pkt_mark,set_field:0x9001->reg6,set_field:0x8000->reg15,resubmit(CONTROLLER,43) data_len=60 2025-03-01T11:27:17.972Z|19884419|vconn|DBG|unix#33063: received: OFPT_PACKET_OUT (OF1.5) (xid=0xe37ba): in_port=CONTROLLER actions=set_field:0x300->reg0,set_field:0x9->reg11,set_field:0xa->reg12,set_field:0xe0->reg13,set_field:0x1->reg14,set_field:0x291->reg15,set_field:0x2->metadata,set_field:0->pkt_mark,set_field:0x9002->reg6,set_field:0x8000->reg15,resubmit(CONTROLLER,43) data_len=60
I've checked presence of error/warn logs at ovs-vswitchd using next command# cat /var/log/kolla/openvswitch/ovs-vswitchd.log | grep -i "resubmit" | grep -iE "warn|err
Also i have non-deployed region running 2024.2 OpenStack and did some dumb tests like creating 1500 VMs with about 500 instances per hypervisor and didn't see this issue. However, i didn't use L2 network in this case, only virtual network like geneve/vxlan. I will try today to create sample L2 vlan with mac learning enabled at the border router and report back. I think this bug will not occur if there is no a lot of packets coming through ovs, handling DHCP with no issues...
24.02.2025 16:00, Dumitru Ceara пишет:
On 2/20/25 11:13 PM, Ilia Baikov via discuss wrote:HiHi Ilia, + Ilya Maximets for more ovs insights.Over a lot of observes and testing various number of possible fixes came with nothing... Yes, there are messages that contains NXT_RESUME substring. ovn-controller: 2025-02-18T21:05:51.771Z|1041129|vconn(ovn_pinctrl0)|DBG|unix:/var/run/ openvswitch/br-int.mgmt: sent (Success): NXT_RESUME (OF1.5) (xid=0x0): cookie=0x404b44bc total_len=348 reg0=0x308,reg11=0x9,reg12=0xa,reg13=0x13c,reg14=0x4a8,metadata=0x2,in_port=312 (via action) data_len=348 (unbuffered) 2025-02-18T21:08:23.729Z|1044255|vconn(ovn_pinctrl0)|DBG|unix:/var/run/ openvswitch/br-int.mgmt: sent (Success): NXT_RESUME (OF1.5) (xid=0x0): cookie=0xffe9d519 total_len=348 reg0=0x308,reg11=0x9,reg12=0xa,reg13=0x112,reg14=0x2dd,metadata=0x2,in_port=363 (via action) data_len=348 (unbuffered) 2025-02-18T21:10:33.865Z|1046917|vconn(ovn_pinctrl0)|DBG|unix:/var/run/ openvswitch/br-int.mgmt: sent (Success): NXT_RESUME (OF1.5) (xid=0x0): cookie=0x5bc96bd6 total_len=348 reg0=0x28b,reg11=0x9,reg12=0xa,reg13=0x57,reg14=0x423,metadata=0x2,in_port=83 (via action) data_len=348 (unbuffered) 2025-02-18T21:11:06.733Z|1047596|vconn(ovn_pinctrl0)|DBG|unix:/var/run/ openvswitch/br-int.mgmt: sent (Success): NXT_RESUME (OF1.5) (xid=0x0): cookie=0x5bc96bd6 total_len=348 reg0=0x28b,reg11=0x9,reg12=0xa,reg13=0x5f,reg14=0x413,metadata=0x2,in_port=91 (via action) data_len=348 (unbuffered) 2025-02-18T21:12:16.984Z|1048995|vconn(ovn_pinctrl0)|DBG|unix:/var/run/ openvswitch/br-int.mgmt: sent (Success): NXT_RESUME (OF1.5) (xid=0x0): cookie=0xfded3de0 total_len=348 reg0=0x308,reg11=0x9,reg12=0xa,reg13=0xc5,reg14=0x354,metadata=0x2,in_port=373 (via action) data_len=348 (unbuffered) 2025-02-18T21:13:03.396Z|1049934|vconn(ovn_pinctrl0)|DBG|unix:/var/run/ openvswitch/br-int.mgmt: sent (Success): NXT_RESUME (OF1.5) (xid=0x0): cookie=0xfded3de0 total_len=348 reg0=0x308,reg11=0x9,reg12=0xa,reg13=0xc5,reg14=0x354,metadata=0x2,in_port=373 (via action) data_len=348 (unbuffered) 2025-02-18T21:19:48.641Z|1058227|vconn(ovn_pinctrl0)|DBG|unix:/var/run/ openvswitch/br-int.mgmt: sent (Success): NXT_RESUME (OF1.5) (xid=0x0): cookie=0xbbfea837 total_len=348 reg0=0x308,reg11=0x9,reg12=0xa,reg13=0x132,reg14=0x434,metadata=0x2,in_port=302 (via action) data_len=348 (unbuffered) 2025-02-18T21:19:50.069Z|1058255|vconn(ovn_pinctrl0)|DBG|unix:/var/run/ openvswitch/br-int.mgmt: sent (Success): NXT_RESUME (OF1.5) (xid=0x0): cookie=0x5bc96bd6 total_len=348 reg0=0x28b,reg11=0x9,reg12=0xa,reg13=0x6f,reg14=0x148,metadata=0x2,in_port=107 (via action) data_len=348 (unbuffered) 2025-02-18T21:21:38.224Z|1060474|vconn(ovn_pinctrl0)|DBG|unix:/var/run/ openvswitch/br-int.mgmt: sent (Success): NXT_RESUME (OF1.5) (xid=0x0): cookie=0x5bc96bd6 total_len=348 reg0=0x28b,reg11=0x9,reg12=0xa,reg13=0xdb,reg14=0x336,metadata=0x2,in_port=215 (via action) data_len=348 (unbuffered) 2025-02-18T21:21:42.735Z|1060568|vconn(ovn_pinctrl0)|DBG|unix:/var/run/ openvswitch/br-int.mgmt: sent (Success): NXT_RESUME (OF1.5) (xid=0x0): cookie=0xffe9d519 total_len=348 reg0=0x308,reg11=0x9,reg12=0xa,reg13=0x112,reg14=0x2dd,metadata=0x2,in_port=363 (via action) data_len=348 (unbuffered) ovs-vswitchd 2025-02-18T20:55:45.673Z|1086321|vconn|DBG|unix#0: received: NXT_RESUME (OF1.5) (xid=0x0): cookie=0x58482a66 total_len=348 reg0=0x308,reg11=0x9,reg12=0xa,reg13=0x151,reg14=0x4b1,metadata=0x2,in_port=375 (via action) data_len=348 (unbuffered) 2025-02-18T21:01:05.872Z|1092877|vconn|DBG|unix#0: received: NXT_RESUME (OF1.5) (xid=0x0): cookie=0x983e6272 total_len=348 reg0=0x308,reg11=0x9,reg12=0xa,reg13=0x15f,reg14=0x4d3,metadata=0x2,in_port=368 (via action) data_len=348 (unbuffered) 2025-02-18T21:01:23.672Z|1093239|vconn|DBG|unix#0: received: NXT_RESUME (OF1.5) (xid=0x0): cookie=0xfded3de0 total_len=348 reg0=0x308,reg11=0x9,reg12=0xa,reg13=0xc5,reg14=0x354,metadata=0x2,in_port=373 (via action) data_len=348 (unbuffered) 2025-02-18T21:04:03.272Z|1096513|vconn|DBG|unix#0: received: NXT_RESUME (OF1.5) (xid=0x0): cookie=0xfded3de0 total_len=348 reg0=0x308,reg11=0x9,reg12=0xa,reg13=0xc5,reg14=0x354,metadata=0x2,in_port=373 (via action) data_len=348 (unbuffered) 2025-02-18T21:07:36.773Z|1100887|vconn|DBG|unix#0: received: NXT_RESUME (OF1.5) (xid=0x0): cookie=0xfded3de0 total_len=348 reg0=0x308,reg11=0x9,reg12=0xa,reg13=0xc5,reg14=0x354,metadata=0x2,in_port=373 (via action) data_len=348 (unbuffered) 2025-02-18T21:07:51.472Z|1101187|vconn|DBG|unix#0: received: NXT_RESUME (OF1.5) (xid=0x0): cookie=0xa6d16298 total_len=348 reg0=0x308,reg11=0x9,reg12=0xa,reg13=0x4b,reg14=0x1bb,metadata=0x2,in_port=349 (via action) data_len=348 (unbuffered) 2025-02-18T21:08:08.673Z|1101537|vconn|DBG|unix#0: received: NXT_RESUME (OF1.5) (xid=0x0): cookie=0x8b831d17 total_len=348 reg0=0x308,reg11=0x9,reg12=0xa,reg13=0x26,reg14=0x123,metadata=0x2,in_port=34 (via action) data_len=348 (unbuffered) 2025-02-18T21:09:09.173Z|1102776|vconn|DBG|unix#0: received: NXT_RESUME (OF1.5) (xid=0x0): cookie=0xbf424a23 total_len=348 reg0=0x308,reg11=0x9,reg12=0xa,reg13=0x85,reg14=0x2d9,metadata=0x2,in_port=371 (via action) data_len=348 (unbuffered) 2025-02-18T21:13:03.574Z|1108046|vconn|DBG|unix#0: received: NXT_RESUME (OF1.5) (xid=0x0): cookie=0xfded3de0 total_len=348 reg0=0x308,reg11=0x9,reg12=0xa,reg13=0xc5,reg14=0x354,metadata=0x2,in_port=373 (via action) data_len=348 (unbuffered) 2025-02-18T21:17:18.772Z|1113273|vconn|DBG|unix#0: received: NXT_RESUME (OF1.5) (xid=0x0): cookie=0x5bc96bd6 total_len=348 reg0=0x28b,reg11=0x9,reg12=0xa,reg13=0x7c,reg14=0x241,metadata=0x2,in_port=120 (via action) data_len=348 (unbuffered) 2025-02-18T21:18:51.673Z|1115178|vconn|DBG|unix#0: received: NXT_RESUME (OF1.5) (xid=0x0): cookie=0xfded3de0 total_len=348 reg0=0x308,reg11=0x9,reg12=0xa,reg13=0xc5,reg14=0x354,metadata=0x2,in_port=373 (via action) data_len=348 (unbuffered) 2025-02-18T21:19:36.872Z|1116105|vconn|DBG|unix#0: received: NXT_RESUME (OF1.5) (xid=0x0): cookie=0x5bc96bd6 total_len=348 reg0=0x28b,reg11=0x9,reg12=0xa,reg13=0x87,reg14=0x4b,metadata=0x2,in_port=131 (via action) data_len=348 (unbuffered) 2025-02-18T21:19:52.172Z|1116417|vconn|DBG|unix#0: received: NXT_RESUME (OF1.5) (xid=0x0): cookie=0xfded3de0 total_len=348 reg0=0x308,reg11=0x9,reg12=0xa,reg13=0xc5,reg14=0x354,metadata=0x2,in_port=373 (via action) data_len=348 (unbuffered) 2025-02-18T21:20:24.572Z|1117078|vconn|DBG|unix#0: received: NXT_RESUME (OF1.5) (xid=0x0): cookie=0x5bc96bd6 total_len=348 reg0=0x28b,reg11=0x9,reg12=0xa,reg13=0x8b,reg14=0x2fb,metadata=0x2,in_port=135 (via action) data_len=348 (unbuffered) 2025-02-18T21:21:31.672Z|1118459|vconn|DBG|unix#0: received: NXT_RESUME (OF1.5) (xid=0x0): cookie=0x5bc96bd6 total_len=348 reg0=0x28b,reg11=0x9,reg12=0xa,reg13=0x93,reg14=0xdd,metadata=0x2,in_port=143 (via action) data_len=348 (unbuffered) The interesting logs is also at ovn-controller logs 2025-02-18T21:21:10.152Z|1059900|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER fa:16:3e:a6:57:54 13.37.13.68 2025-02-18T21:21:10.560Z|1059909|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER fa:16:3e:43:d4:9a 13.38.13.15 2025-02-18T21:21:14.178Z|1059983|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER fa:16:3e:cb:77:8d 13.37.13.72 2025-02-18T21:21:16.785Z|1060037|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER fa:16:3e:cc:d3:66 13.38.13.62 2025-02-18T21:21:19.661Z|1060097|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER fa:16:3e:7e:1b:a8 13.37.13.173 2025-02-18T21:21:22.582Z|1060157|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER fa:16:3e:cc:d3:66 13.38.13.62 2025-02-18T21:21:25.903Z|1060225|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER fa:16:3e:f6:ae:8b 13.37.13.11 2025-02-18T21:21:28.643Z|1060281|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER fa:16:3e:b0:98:e0 13.37.13.101 2025-02-18T21:21:32.687Z|1060361|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER fa:16:3e:82:6e:c3 13.37.13.183 2025-02-18T21:21:34.517Z|1060398|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER fa:16:3e:cb:08:c7 13.37.13.55 2025-02-18T21:21:38.673Z|1060484|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER fa:16:3e:f9:5b:25 13.37.13.98 2025-02-18T21:21:40.782Z|1060528|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER fa:16:3e:cc:d3:66 13.38.13.62 2025-02-18T21:21:44.097Z|1060596|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER fa:16:3e:95:0c:5e 13.37.13.197 2025-02-18T21:21:46.484Z|1060646|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER fa:16:3e:cc:d3:66 13.38.13.62 2025-02-18T21:21:50.203Z|1060722|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER fa:16:3e:92:f8:ba 13.37.13.124 2025-02-18T21:21:52.764Z|1060776|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER fa:16:3e:ce:b8:1e 13.37.13.126 2025-02-18T21:21:55.503Z|1060832|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER fa:16:3e:8d:88:f6 13.38.13.63 2025-02-18T21:21:59.520Z|1060914|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER fa:16:3e:8f:20:53 13.37.13.63 2025-02-18T21:22:01.620Z|1060958|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER fa:16:3e:8a:7a:47 13.37.13.150 2025-02-18T21:22:04.418Z|1061015|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER fa:16:3e:70:7e:5f 13.37.13.216 2025-02-18T21:22:07.477Z|1061079|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER fa:16:3e:2a:7a:ad 13.37.13.50 2025-02-18T21:22:11.719Z|1061165|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER fa:16:3e:cc:d3:66 13.38.13.62 2025-02-18T21:22:13.798Z|1061208|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER fa:16:3e:a6:57:54 13.37.13.68 2025-02-18T21:22:16.612Z|1061266|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER fa:16:3e:81:5f:99 13.37.13.14 2025-02-18T21:22:19.830Z|1061334|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER fa:16:3e:cc:d3:66 13.38.13.62 2025-02-18T21:22:22.460Z|1061388|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER fa:16:3e:cc:d3:66 13.38.13.62 2025-02-18T21:22:26.001Z|1061460|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER fa:16:3e:cc:d3:66 13.38.13.62 2025-02-18T21:22:28.383Z|1061506|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER fa:16:3e:cc:d3:66 13.38.13.62 2025-02-18T21:22:31.594Z|1061572|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER fa:16:3e:58:35:35 13.37.13.64 2025-02-18T21:22:35.249Z|1061648|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER fa:16:3e:cc:d3:66 13.38.13.62 2025-02-18T21:22:37.578Z|1061695|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER fa:16:3e:82:6e:c3 13.37.13.183 2025-02-18T21:22:40.514Z|1061755|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER fa:16:3e:cc:d3:66 13.38.13.62 2025-02-18T21:22:43.373Z|1061815|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER fa:16:3e:58:72:c4 13.37.13.65 2025-02-18T21:22:46.557Z|1061881|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER fa:16:3e:02:5e:ac 13.37.13.111 2025-02-18T21:22:50.020Z|1061951|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER fa:16:3e:cc:d3:66 13.38.13.62 2025-02-18T21:22:53.511Z|1062023|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER fa:16:3e:5f:fd:6f 13.37.13.15 2025-02-18T21:22:55.717Z|1062069|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER fa:16:3e:cc:d3:66 13.38.13.62 2025-02-18T21:22:58.812Z|1062133|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER fa:16:3e:8d:88:f6 13.38.13.63 2025-02-18T21:23:01.450Z|1062189|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER fa:16:3e:9a:da:47 13.37.13.225 2025-02-18T21:23:05.545Z|1062273|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER fa:16:3e:8a:7a:47 13.37.13.150 2025-02-18T21:23:08.512Z|1062332|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER fa:16:3e:70:7e:5f 13.37.13.216 2025-02-18T21:23:10.980Z|1062384|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER fa:16:3e:02:ff:8a 13.37.13.233 2025-02-18T21:23:14.405Z|1062454|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER fa:16:3e:cc:d3:66 13.38.13.62 2025-02-18T21:23:17.172Z|1062511|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER fa:16:3e:2e:73:8f 13.37.13.186 2025-02-18T21:23:19.435Z|1062559|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER fa:16:3e:39:8d:2f 13.37.13.153 2025-02-18T21:23:23.049Z|1062633|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER fa:16:3e:cc:d3:66 13.38.13.62 2025-02-18T21:23:25.653Z|1062687|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER fa:16:3e:d4:36:c6 13.37.13.60 2025-02-18T21:23:28.768Z|1062749|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER fa:16:3e:f1:64:3a 13.37.13.125 2025-02-18T21:23:33.340Z|1062843|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER fa:16:3e:56:c4:04 13.37.13.219 2025-02-18T21:23:34.400Z|1062865|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER fa:16:3e:6a:7b:a4 13.37.13.121 2025-02-18T21:23:39.485Z|1062968|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER fa:16:3e:cc:d3:66 13.38.13.62 2025-02-18T21:23:41.393Z|1063007|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER fa:16:3e:82:6e:c3 13.37.13.183 2025-02-18T21:23:44.410Z|1063069|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER fa:16:3e:52:ef:94 13.37.13.182 IPs are sanitized, anyway 13.37.13 belongs to subnet #1 and 13.38.13 to subnet #2. Observing large regression across all infra, like almost 50% overall network bandwith loss over time. Any thoughts guys?Do you see any warning/error logs in ovs-vswitchd.log?Context: we run a lot of small instances with 1 x fixed IP with little amount of external traffic, about 300 per hypervisor, sometimes more, sometimes less anyway didn't see anything similar with plain ovs and neutron-openvswitch-agent (openstack). UPD: Once i manually cleared dhcpv4_options column in all ports seems like neutron-dhcp-agent (dnsmasq based) started to send dhcpoffers and most of instances back to normal. However, didn't manage what is the reason to ovn-controller not send DHCPOFFER to VIF. Seems like a big point of regression starting from N number of instances or something else.Just double checking, you're saying that with low numbers of instances you don't have this issue? I wonder if you're not potentially hitting the maximum resubmit limit (4096 resubmits) due to DHCPOFFERS being sent as broadcast. There should be warning logs indicating that in ovs-vswitchd. It might also help if you could share the output of: ovs-appctl coverage/show10.02.2025 17:36, Numan Siddique:On Sat, Feb 8, 2025 at 1:27 PM support via discuss <ovs- disc...@openvswitch.org> wrote: I'm observing for a long time next bug: ovn-controller doesn't send DHCPOFFER but it logs that it does. However this VIF doesn't get it at all. But once ovn-controller restarted it successfully seems like it actually send DHCPOFFER and immediately receives DHCPACK from VIF. When you see this issue, can you enable vconn and rcon debug logs in both ovn-controller and ovs-vswitchd ? ovn-appctl vlog/set rconn:dbg ovn-appctl vlog/set vconn:dbg ovs-appctl vlog/set rconn:dbg ovs-appctl vlog/set rconn:dbg If ovn-controller replies with a DHCPOFFER, you should see the NXT_RESUME related logs in both ovn-controller and ovs-vswitchd. This can help you narrow down the issue. Thanks Numan |2025-02-08T14:27:57.462Z|00053|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER fa:16:3e:80:67:62 13.37.0.5 2025-02-08T14:28:01.746Z|00054|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER fa:16:3e:80:67:62 13.37.0.5 2025-02-08T14:28:05.582Z|00055|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER fa:16:3e:80:67:62 13.37.0.5 2025-02-08T14:28:13.053Z|00056|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER fa:16:3e:80:67:62 13.37.0.5 2025-02-08T14:28:21.972Z|00057|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER fa:16:3e:84:8e:7f 13.37.0.37 2025-02-08T14:28:25.109Z|00058|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER fa:16:3e:84:8e:7f 13.37.0.37 2025-02-08T14:28:29.362Z|00059|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER fa:16:3e:84:8e:7f 13.37.0.37 2025-02-08T14:28:30.003Z|00060|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER fa:16:3e:80:67:62 13.37.0.5 2025-02-08T14:28:37.820Z|00061|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER fa:16:3e:84:8e:7f 13.37.0.37 2025-02-08T14:28:38.619Z|00002|fatal_signal(ovn_statctrl2)|WARN| terminating with signal 15 (Terminated) # ovn_controller is restarted by docker restart ovn_controller 2025-02-08T14:28:50.639Z|00001|vlog|INFO|opened log file /var/log/kolla/openvswitch/ovn-controller.log 2025-02-08T14:28:50.640Z|00002|vif_plug_representor|INFO|unable to set udev receive buffer size: Operation not permitted 2025-02-08T14:28:50.641Z|00003|reconnect|INFO|tcp:127.0.0.1:6640 <http://127.0.0.1:6640>: connecting... 2025-02-08T14:28:50.641Z|00004|reconnect|INFO|tcp:127.0.0.1:6640 <http://127.0.0.1:6640>: connected 2025-02-08T14:28:50.738Z|00005|main|INFO|OVN internal version is : [24.09.0-20.37.0-73.6] 2025-02-08T14:28:50.738Z|00006|main|INFO|OVS IDL reconnected, force recompute. 2025-02-08T14:28:50.738Z|00007|reconnect|INFO|tcp:10.10.9.5:6642 <http://10.10.9.5:6642>: connecting... 2025-02-08T14:28:50.738Z|00008|main|INFO|OVNSB IDL reconnected, force recompute. 2025-02-08T14:28:50.739Z|00009|ovn_util|INFO|statctrl: connecting to switch: "unix:/var/run/openvswitch/br-int.mgmt" 2025-02-08T14:28:50.739Z|00010|rconn|INFO|unix:/var/run/ openvswitch/br-int.mgmt: connecting... 2025-02-08T14:28:50.739Z|00011|ovn_util|INFO|pinctrl: connecting to switch: "unix:/var/run/openvswitch/br-int.mgmt" 2025-02-08T14:28:50.739Z|00012|rconn|INFO|unix:/var/run/ openvswitch/br-int.mgmt: connecting... 2025-02-08T14:28:50.739Z|00013|reconnect|INFO|tcp:10.10.9.5:6642 <http://10.10.9.5:6642>: connected 2025-02-08T14:28:50.739Z|00001|rconn(ovn_statctrl2)|INFO|unix:/ var/run/openvswitch/br-int.mgmt: connected 2025-02-08T14:28:50.743Z|00001|rconn(ovn_pinctrl0)|INFO|unix:/ var/run/openvswitch/br-int.mgmt: connected 2025-02-08T14:28:50.756Z|00014|ovn_util|INFO|features: connecting to switch: "unix:/var/run/openvswitch/br-int.mgmt" 2025-02-08T14:28:50.757Z|00015|rconn|INFO|unix:/var/run/ openvswitch/br-int.mgmt: connecting... 2025-02-08T14:28:50.757Z|00016|features|INFO|OVS Feature: ct_zero_snat, state: supported 2025-02-08T14:28:50.757Z|00017|features|INFO|OVS Feature: ct_flush, state: supported 2025-02-08T14:28:50.757Z|00018|features|INFO|OVS Feature: dp_hash_l4_sym_support, state: supported 2025-02-08T14:28:50.757Z|00019|main|INFO|OVS feature set changed, force recompute. 2025-02-08T14:28:50.757Z|00020|ovn_util|INFO|ofctrl: connecting to switch: "unix:/var/run/openvswitch/br-int.mgmt" 2025-02-08T14:28:50.757Z|00021|rconn|INFO|unix:/var/run/ openvswitch/br-int.mgmt: connecting... 2025-02-08T14:28:50.758Z|00022|rconn|INFO|unix:/var/run/ openvswitch/br-int.mgmt: connected 2025-02-08T14:28:50.758Z|00023|main|INFO|OVS OpenFlow connection reconnected,force recompute. 2025-02-08T14:28:50.759Z|00024|rconn|INFO|unix:/var/run/ openvswitch/br-int.mgmt: connected 2025-02-08T14:28:50.760Z|00025|features|INFO|OVS Feature: meter_support, state: supported 2025-02-08T14:28:50.760Z|00026|features|INFO|OVS Feature: group_support, state: supported 2025-02-08T14:28:50.760Z|00027|features|INFO|OVS Feature: sample_action_with_registers, state: supported 2025-02-08T14:28:50.760Z|00028|main|INFO|OVS feature set changed, force recompute. 2025-02-08T14:28:51.483Z|00029|binding|INFO|Releasing lport 978a923d-1076-484b-a1d5-ff5827ae69ec from this chassis (sb_readonly=0) 2025-02-08T14:28:53.658Z|00030|poll_loop|INFO|wakeup due to [POLLIN] on fd 30 (10.10.9.3:60902 <http://10.10.9.3:60902><->10.10.9.5:6642 <http://10.10.9.5:6642>) at lib/stream-fd.c:157 (78% CPU usage) 2025-02-08T14:28:54.726Z|00002|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER fa:16:3e:84:8e:7f 13.37.0.37 2025-02-08T14:28:59.134Z|00003|pinctrl(ovn_pinctrl0)|INFO|DHCPACK fa:16:3e:84:8e:7f 13.37.0.37 2025-02-08T14:29:01.499Z|00004|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER fa:16:3e:80:67:62 13.37.0.5 2025-02-08T14:29:03.245Z|00005|pinctrl(ovn_pinctrl0)|INFO|DHCPACK fa:16:3e:84:8e:7f 13.37.0.37 2025-02-08T14:29:03.455Z|00031|memory|INFO|98744 kB peak resident set size after 12.8 seconds 2025-02-08T14:29:03.455Z|00032|memory|INFO|idl-cells- OVN_Southbound:168023 idl-cells-Open_vSwitch:36874 lflow-cache-entries-cache-expr:6494 lflow-cache-entries-cache-matches:1413 lflow-cache-size-KB:20726 local_datapath_usage-KB:1 ofctrl_desired_flow_usage-KB:6342 ofctrl_installed_flow_usage-KB:4695 ofctrl_sb_flow_ref_usage- KB:2355 | Any suggestions? Environment: OpenStack 2024.2 Kernel 6.8.0-50-generic #51 <https://github.com/ovn-org/ovn/issues/51>~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC |ovs-vsctl (Open vSwitch) 3.4.0 DB Schema 8.7.0 | |ovs-ofctl (Open vSwitch) 3.4.0 OpenFlow versions 0x1:0x6 | |# ovn-nbctl --version ovn-nbctl 24.09.0 Open vSwitch Library 3.4.0 DB Schema 7.6.0 | |# ovn-sbctl --version ovn-sbctl 24.09.0 Open vSwitch Library 3.4.0 DB Schema 20.37.0|Thanks, Dumitru
_______________________________________________ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss