I'm back to digging into this more deep. PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 4446 root 20 0 395500 88276 6336 S 1.0 0.0 13:07.66 ovn_pinctrl0 3773 root 20 0 395500 88276 6336 S 0.0 0.0 5:19.43 ovn-controller 4449 root 20 0 395500 88276 6336 S 0.0 0.0 0:00.00 urcu1 4452 root 20 0 395500 88276 6336 S 0.0 0.0 0:00.07 ovn_statctrl2 5832 root 20 0 395500 88276 6336 S 0.0 0.0 0:04.07 stopwatch3
Interesting thing that ovn-controller as well as pinctrl doesn't seems starvating for CPU as it is usage is likely about 1%. Host CPU is also feeling a way good %Cpu(s): 29.5 us, 20.0 sy, 0.0 ni, 50.0 id, 0.0 wa, 0.0 hi, 0.6 si, 0.0 st Despite that, ovs-vswitchd seems consuming way more CPU PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 139416 root 20 0 8378828 452288 8448 R 98.7 0.1 990:55.85 ovs-vswitchd 139509 root 20 0 8378828 452288 8448 S 10.7 0.1 82:49.98 revalidator90 139522 root 20 0 8378828 452288 8448 R 10.7 0.1 32:01.96 revalidator103 139519 root 20 0 8378828 452288 8448 R 10.3 0.1 54:28.10 revalidator100 139516 root 20 0 8378828 452288 8448 S 10.0 0.1 32:34.74 revalidator97 139526 root 20 0 8378828 452288 8448 R 10.0 0.1 49:16.24 revalidator107 139531 root 20 0 8378828 452288 8448 R 10.0 0.1 40:45.87 revalidator112 139523 root 20 0 8378828 452288 8448 S 9.7 0.1 49:39.00 revalidator104 139512 root 20 0 8378828 452288 8448 S 7.7 0.1 49:37.77 revalidator93 139497 root 20 0 8378828 452288 8448 S 7.3 0.1 67:31.93 handler78 139513 root 20 0 8378828 452288 8448 S 7.3 0.1 36:44.18 revalidator94 139525 root 20 0 8378828 452288 8448 S 6.3 0.1 36:08.10 revalidator106 139520 root 20 0 8378828 452288 8448 S 5.7 0.1 49:48.41 revalidator101 139521 root 20 0 8378828 452288 8448 R 5.7 0.1 36:22.60 revalidator102 139510 root 20 0 8378828 452288 8448 R 5.0 0.1 48:53.38 revalidator91 139511 root 20 0 8378828 452288 8448 R 5.0 0.1 54:09.25 revalidator92 More logs: 15:44:45.190923 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from fa:16:3e:50:22:c4, length 300 15:44:46.446755 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from fa:16:3e:50:22:c4, length 300 15:44:47.491209 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from fa:16:3e:50:22:c4, length 300 15:44:47.869273 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from fa:16:3e:50:22:c4, length 300 15:44:48.297140 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from fa:16:3e:50:22:c4, length 300 15:44:51.190709 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, Request from fa:16:3e:50:22:c4, length 300 Manually tried to use dhclient to obtain IP but no success. There is logs that ovn-controller sent DHCPOFFER but no packet arrived to VM interface.Switched back to configuration with no neutron-dhcp-agent. I have synced ovn db with neutron-ovn-db-sync-util. There is coverage/show result of node where i'm seeing this issue right now (current network bandwidth is about 300Mbps). # ovs-appctl coverage/show Event coverage, avg rate over last: 5 seconds, last minute, last hour, hash=982b7b40: nln_changed 0.0/sec 0.000/sec 0.0197/sec total: 8568 route_table_dump 0.0/sec 0.000/sec 0.0072/sec total: 3953 netlink_sent 1158.4/sec 1109.700/sec 1018.6758/sec total: 840427925 netlink_recv_jumbo 144.2/sec 140.883/sec 212.7489/sec total: 127752548 netlink_received 1258.4/sec 1229.533/sec 1192.0311/sec total: 1072129295 netdev_set_ethtool 0.0/sec 0.000/sec 0.0006/sec total: 361 netdev_get_ethtool 0.0/sec 0.000/sec 0.0039/sec total: 2808 netdev_get_ifindex 0.0/sec 0.000/sec 0.3358/sec total: 14218 netdev_set_policing 0.0/sec 0.000/sec 0.0028/sec total: 1685 vconn_sent 596.2/sec 576.967/sec 585.4778/sec total: 643013987 vconn_received 320.0/sec 310.883/sec 307.5108/sec total: 360905481 util_xalloc 937354.8/sec 912615.150/sec 873946.4494/sec total: 1068314545738 unixctl_replied 0.0/sec 0.033/sec 0.0331/sec total: 33146 unixctl_received 0.0/sec 0.033/sec 0.0331/sec total: 33146 long_poll_interval 0.0/sec 0.000/sec 0.0000/sec total: 1 stream_open 0.0/sec 0.000/sec 0.0000/sec total: 2 pstream_open 0.0/sec 0.000/sec 0.0000/sec total: 5 seq_change 3283.6/sec 3255.167/sec 3039.7792/sec total: 3088392125 rconn_sent 596.2/sec 576.967/sec 585.4767/sec total: 643013979 rconn_queued 596.2/sec 576.967/sec 585.5044/sec total: 643014079 rconn_overflow 0.0/sec 0.000/sec 2.0272/sec total: 16543 rconn_discarded 0.0/sec 0.000/sec 0.0278/sec total: 100 poll_zero_timeout 275.2/sec 247.600/sec 161.9242/sec total: 136657138 poll_create_node 4749.6/sec 4417.217/sec 4062.3500/sec total: 4007153255 txn_success 0.2/sec 0.183/sec 0.1828/sec total: 184805 txn_incomplete 0.4/sec 0.367/sec 0.4856/sec total: 372203 txn_unchanged 0.0/sec 0.000/sec 0.1169/sec total: 7146 netdev_get_stats 77.6/sec 71.133/sec 70.5939/sec total: 73986502 mac_learning_moved 0.0/sec 0.000/sec 0.0000/sec total: 425 mac_learning_expired 0.6/sec 0.217/sec 0.4439/sec total: 409542 mac_learning_learned 0.2/sec 0.233/sec 0.4403/sec total: 410444 hmap_expand 298.0/sec 265.833/sec 329.0514/sec total: 291663955 hmap_pathological 10.4/sec 6.050/sec 9.1406/sec total: 8189560 hindex_expand 0.0/sec 0.000/sec 0.0000/sec total: 12 hindex_pathological 0.0/sec 0.000/sec 0.0000/sec total: 148 miniflow_malloc 0.0/sec 0.000/sec 21.7272/sec total: 3395676 flow_extract 843.4/sec 801.333/sec 727.9828/sec total: 755385765 dpif_port_del 0.0/sec 0.000/sec 0.0006/sec total: 379 dpif_port_add 0.0/sec 0.000/sec 0.0006/sec total: 367 dpif_flow_put_error 6.0/sec 7.633/sec 5.1642/sec total: 4420389 dpif_flow_put 232.2/sec 208.867/sec 126.8286/sec total: 98257942 dpif_flow_get 0.0/sec 0.000/sec 0.0000/sec total: 386 dpif_flow_flush 0.0/sec 0.000/sec 0.0000/sec total: 1 dpif_flow_del 182.2/sec 203.100/sec 118.5047/sec total: 93717525 dpif_execute_with_help 0.0/sec 0.000/sec 0.0017/sec total: 1649 dpif_execute_error 0.0/sec 0.000/sec 0.0006/sec total: 398 dpif_execute 571.8/sec 537.350/sec 439.8069/sec total: 472799472 cmap_shrink 0.8/sec 0.733/sec 1.4025/sec total: 1180907 cmap_expand 1.0/sec 0.583/sec 1.5000/sec total: 1182882 ccmap_shrink 0.0/sec 0.000/sec 0.1786/sec total: 747 ccmap_expand 0.0/sec 0.000/sec 0.1200/sec total: 774 connmgr_async_unsent 0.0/sec 0.000/sec 0.0372/sec total: 139 xlate_actions 4171.0/sec 3071.683/sec 3133.7436/sec total: 3107954003 upcall_ukey_replace 0.0/sec 0.117/sec 0.1739/sec total: 155715 upcall_ukey_contention 0.0/sec 0.000/sec 0.0003/sec total: 157 upcall_flow_limit_grew 0.0/sec 0.550/sec 0.2328/sec total: 56359 ukey_replace_contention 0.0/sec 0.017/sec 0.0008/sec total: 847 revalidate_missed_dp_flow 0.0/sec 0.117/sec 0.0472/sec total: 106876 handler_duplicate_upcall 14.0/sec 12.817/sec 11.5303/sec total: 9317449 dumped_new_flow 0.0/sec 0.000/sec 0.0000/sec total: 842 dumped_duplicate_flow 0.0/sec 0.000/sec 0.0003/sec total: 37029 rev_mac_learning 0.8/sec 0.417/sec 0.6419/sec total: 708403 rev_flow_table 0.0/sec 0.000/sec 0.0033/sec total: 4003 rev_port_toggled 0.0/sec 0.000/sec 0.0000/sec total: 119 rev_reconfigure 0.0/sec 0.000/sec 0.0008/sec total: 691 ofproto_update_port 0.0/sec 0.000/sec 0.0039/sec total: 2430 ofproto_recv_openflow 320.0/sec 310.883/sec 307.5097/sec total: 360905473 ofproto_packet_out 319.6/sec 310.633/sec 299.3358/sec total: 360555564 ofproto_flush 0.0/sec 0.000/sec 0.0000/sec total: 1 bridge_reconfigure 0.0/sec 0.000/sec 0.1125/sec total: 5108 100 events never hit > Do you see any warning/error logs in ovs-vswitchd.log? I see no noticable errors except usual warn logs like invalid packet. > Just double checking, you're saying that with low numbers of instances > you don't have this issue? Exactly. With relatively low instances count there is no visible issue. > I wonder if you're not potentially hitting the maximum resubmit limit > (4096 resubmits) due to DHCPOFFERS being sent as broadcast. There > should be warning logs indicating that in ovs-vswitchd. There is no error or warn logs mentioning resubmit, but a lot of such logs: 2025-03-01T11:27:16.473Z|19884389|vconn|DBG|unix#33063: received: OFPT_PACKET_OUT (OF1.5) (xid=0xe3465): in_port=CONTROLLER actions=set_field:0x300->reg0,set_field:0x9->reg11,set_field:0xa->reg12,set_field:0xe0->reg13,set_field:0x1->reg14,set_field:0x291->reg15,set_field:0x2->metadata,set_field:0->pkt_mark,set_field:0x9002->reg6,set_field:0x8000->reg15,resubmit(CONTROLLER,43) data_len=60 2025-03-01T11:27:16.972Z|19884399|vconn|DBG|unix#33063: received: OFPT_PACKET_OUT (OF1.5) (xid=0xe358a): in_port=CONTROLLER actions=set_field:0x300->reg0,set_field:0x9->reg11,set_field:0xa->reg12,set_field:0x93->reg13,set_field:0x1->reg14,set_field:0xdd->reg15,set_field:0x2->metadata,set_field:0->pkt_mark,set_field:0x9001->reg6,set_field:0x8000->reg15,resubmit(CONTROLLER,43) data_len=60 2025-03-01T11:27:17.873Z|19884417|vconn|DBG|unix#33063: received: OFPT_PACKET_OUT (OF1.5) (xid=0xe3765): in_port=CONTROLLER actions=set_field:0x300->reg0,set_field:0x9->reg11,set_field:0xa->reg12,set_field:0x93->reg13,set_field:0x1->reg14,set_field:0xdd->reg15,set_field:0x2->metadata,set_field:0->pkt_mark,set_field:0x9001->reg6,set_field:0x8000->reg15,resubmit(CONTROLLER,43) data_len=60 2025-03-01T11:27:17.972Z|19884419|vconn|DBG|unix#33063: received: OFPT_PACKET_OUT (OF1.5) (xid=0xe37ba): in_port=CONTROLLER actions=set_field:0x300->reg0,set_field:0x9->reg11,set_field:0xa->reg12,set_field:0xe0->reg13,set_field:0x1->reg14,set_field:0x291->reg15,set_field:0x2->metadata,set_field:0->pkt_mark,set_field:0x9002->reg6,set_field:0x8000->reg15,resubmit(CONTROLLER,43) data_len=60 I've checked presence of error/warn logs at ovs-vswitchd using next command # cat /var/log/kolla/openvswitch/ovs-vswitchd.log | grep -i "resubmit" | grep -iE "warn|err Also i have non-deployed region running 2024.2 OpenStack and did some dumb tests like creating 1500 VMs with about 500 instances per hypervisor and didn't see this issue. However, i didn't use L2 network in this case, only virtual network like geneve/vxlan. I will try today to create sample L2 vlan with mac learning enabled at the border router and report back. I think this bug will not occur if there is no a lot of packets coming through ovs, handling DHCP with no issues... 24.02.2025 16:00, Dumitru Ceara пишет: > On 2/20/25 11:13 PM, Ilia Baikov via discuss wrote: >> Hi >> > Hi Ilia, > > + Ilya Maximets for more ovs insights. > >> Over a lot of observes and testing various number of possible fixes came >> with nothing... >> >> Yes, there are messages that contains NXT_RESUME substring. >> >> ovn-controller: >> 2025-02-18T21:05:51.771Z|1041129|vconn(ovn_pinctrl0)|DBG|unix:/var/run/ >> openvswitch/br-int.mgmt: sent (Success): NXT_RESUME (OF1.5) (xid=0x0): >> cookie=0x404b44bc total_len=348 >> reg0=0x308,reg11=0x9,reg12=0xa,reg13=0x13c,reg14=0x4a8,metadata=0x2,in_port=312 >> (via action) data_len=348 (unbuffered) >> 2025-02-18T21:08:23.729Z|1044255|vconn(ovn_pinctrl0)|DBG|unix:/var/run/ >> openvswitch/br-int.mgmt: sent (Success): NXT_RESUME (OF1.5) (xid=0x0): >> cookie=0xffe9d519 total_len=348 >> reg0=0x308,reg11=0x9,reg12=0xa,reg13=0x112,reg14=0x2dd,metadata=0x2,in_port=363 >> (via action) data_len=348 (unbuffered) >> 2025-02-18T21:10:33.865Z|1046917|vconn(ovn_pinctrl0)|DBG|unix:/var/run/ >> openvswitch/br-int.mgmt: sent (Success): NXT_RESUME (OF1.5) (xid=0x0): >> cookie=0x5bc96bd6 total_len=348 >> reg0=0x28b,reg11=0x9,reg12=0xa,reg13=0x57,reg14=0x423,metadata=0x2,in_port=83 >> (via action) data_len=348 (unbuffered) >> 2025-02-18T21:11:06.733Z|1047596|vconn(ovn_pinctrl0)|DBG|unix:/var/run/ >> openvswitch/br-int.mgmt: sent (Success): NXT_RESUME (OF1.5) (xid=0x0): >> cookie=0x5bc96bd6 total_len=348 >> reg0=0x28b,reg11=0x9,reg12=0xa,reg13=0x5f,reg14=0x413,metadata=0x2,in_port=91 >> (via action) data_len=348 (unbuffered) >> 2025-02-18T21:12:16.984Z|1048995|vconn(ovn_pinctrl0)|DBG|unix:/var/run/ >> openvswitch/br-int.mgmt: sent (Success): NXT_RESUME (OF1.5) (xid=0x0): >> cookie=0xfded3de0 total_len=348 >> reg0=0x308,reg11=0x9,reg12=0xa,reg13=0xc5,reg14=0x354,metadata=0x2,in_port=373 >> (via action) data_len=348 (unbuffered) >> 2025-02-18T21:13:03.396Z|1049934|vconn(ovn_pinctrl0)|DBG|unix:/var/run/ >> openvswitch/br-int.mgmt: sent (Success): NXT_RESUME (OF1.5) (xid=0x0): >> cookie=0xfded3de0 total_len=348 >> reg0=0x308,reg11=0x9,reg12=0xa,reg13=0xc5,reg14=0x354,metadata=0x2,in_port=373 >> (via action) data_len=348 (unbuffered) >> 2025-02-18T21:19:48.641Z|1058227|vconn(ovn_pinctrl0)|DBG|unix:/var/run/ >> openvswitch/br-int.mgmt: sent (Success): NXT_RESUME (OF1.5) (xid=0x0): >> cookie=0xbbfea837 total_len=348 >> reg0=0x308,reg11=0x9,reg12=0xa,reg13=0x132,reg14=0x434,metadata=0x2,in_port=302 >> (via action) data_len=348 (unbuffered) >> 2025-02-18T21:19:50.069Z|1058255|vconn(ovn_pinctrl0)|DBG|unix:/var/run/ >> openvswitch/br-int.mgmt: sent (Success): NXT_RESUME (OF1.5) (xid=0x0): >> cookie=0x5bc96bd6 total_len=348 >> reg0=0x28b,reg11=0x9,reg12=0xa,reg13=0x6f,reg14=0x148,metadata=0x2,in_port=107 >> (via action) data_len=348 (unbuffered) >> 2025-02-18T21:21:38.224Z|1060474|vconn(ovn_pinctrl0)|DBG|unix:/var/run/ >> openvswitch/br-int.mgmt: sent (Success): NXT_RESUME (OF1.5) (xid=0x0): >> cookie=0x5bc96bd6 total_len=348 >> reg0=0x28b,reg11=0x9,reg12=0xa,reg13=0xdb,reg14=0x336,metadata=0x2,in_port=215 >> (via action) data_len=348 (unbuffered) >> 2025-02-18T21:21:42.735Z|1060568|vconn(ovn_pinctrl0)|DBG|unix:/var/run/ >> openvswitch/br-int.mgmt: sent (Success): NXT_RESUME (OF1.5) (xid=0x0): >> cookie=0xffe9d519 total_len=348 >> reg0=0x308,reg11=0x9,reg12=0xa,reg13=0x112,reg14=0x2dd,metadata=0x2,in_port=363 >> (via action) data_len=348 (unbuffered) >> >> ovs-vswitchd >> 2025-02-18T20:55:45.673Z|1086321|vconn|DBG|unix#0: received: NXT_RESUME >> (OF1.5) (xid=0x0): cookie=0x58482a66 total_len=348 >> reg0=0x308,reg11=0x9,reg12=0xa,reg13=0x151,reg14=0x4b1,metadata=0x2,in_port=375 >> (via action) data_len=348 (unbuffered) >> 2025-02-18T21:01:05.872Z|1092877|vconn|DBG|unix#0: received: NXT_RESUME >> (OF1.5) (xid=0x0): cookie=0x983e6272 total_len=348 >> reg0=0x308,reg11=0x9,reg12=0xa,reg13=0x15f,reg14=0x4d3,metadata=0x2,in_port=368 >> (via action) data_len=348 (unbuffered) >> 2025-02-18T21:01:23.672Z|1093239|vconn|DBG|unix#0: received: NXT_RESUME >> (OF1.5) (xid=0x0): cookie=0xfded3de0 total_len=348 >> reg0=0x308,reg11=0x9,reg12=0xa,reg13=0xc5,reg14=0x354,metadata=0x2,in_port=373 >> (via action) data_len=348 (unbuffered) >> 2025-02-18T21:04:03.272Z|1096513|vconn|DBG|unix#0: received: NXT_RESUME >> (OF1.5) (xid=0x0): cookie=0xfded3de0 total_len=348 >> reg0=0x308,reg11=0x9,reg12=0xa,reg13=0xc5,reg14=0x354,metadata=0x2,in_port=373 >> (via action) data_len=348 (unbuffered) >> 2025-02-18T21:07:36.773Z|1100887|vconn|DBG|unix#0: received: NXT_RESUME >> (OF1.5) (xid=0x0): cookie=0xfded3de0 total_len=348 >> reg0=0x308,reg11=0x9,reg12=0xa,reg13=0xc5,reg14=0x354,metadata=0x2,in_port=373 >> (via action) data_len=348 (unbuffered) >> 2025-02-18T21:07:51.472Z|1101187|vconn|DBG|unix#0: received: NXT_RESUME >> (OF1.5) (xid=0x0): cookie=0xa6d16298 total_len=348 >> reg0=0x308,reg11=0x9,reg12=0xa,reg13=0x4b,reg14=0x1bb,metadata=0x2,in_port=349 >> (via action) data_len=348 (unbuffered) >> 2025-02-18T21:08:08.673Z|1101537|vconn|DBG|unix#0: received: NXT_RESUME >> (OF1.5) (xid=0x0): cookie=0x8b831d17 total_len=348 >> reg0=0x308,reg11=0x9,reg12=0xa,reg13=0x26,reg14=0x123,metadata=0x2,in_port=34 >> (via action) data_len=348 (unbuffered) >> 2025-02-18T21:09:09.173Z|1102776|vconn|DBG|unix#0: received: NXT_RESUME >> (OF1.5) (xid=0x0): cookie=0xbf424a23 total_len=348 >> reg0=0x308,reg11=0x9,reg12=0xa,reg13=0x85,reg14=0x2d9,metadata=0x2,in_port=371 >> (via action) data_len=348 (unbuffered) >> 2025-02-18T21:13:03.574Z|1108046|vconn|DBG|unix#0: received: NXT_RESUME >> (OF1.5) (xid=0x0): cookie=0xfded3de0 total_len=348 >> reg0=0x308,reg11=0x9,reg12=0xa,reg13=0xc5,reg14=0x354,metadata=0x2,in_port=373 >> (via action) data_len=348 (unbuffered) >> 2025-02-18T21:17:18.772Z|1113273|vconn|DBG|unix#0: received: NXT_RESUME >> (OF1.5) (xid=0x0): cookie=0x5bc96bd6 total_len=348 >> reg0=0x28b,reg11=0x9,reg12=0xa,reg13=0x7c,reg14=0x241,metadata=0x2,in_port=120 >> (via action) data_len=348 (unbuffered) >> 2025-02-18T21:18:51.673Z|1115178|vconn|DBG|unix#0: received: NXT_RESUME >> (OF1.5) (xid=0x0): cookie=0xfded3de0 total_len=348 >> reg0=0x308,reg11=0x9,reg12=0xa,reg13=0xc5,reg14=0x354,metadata=0x2,in_port=373 >> (via action) data_len=348 (unbuffered) >> 2025-02-18T21:19:36.872Z|1116105|vconn|DBG|unix#0: received: NXT_RESUME >> (OF1.5) (xid=0x0): cookie=0x5bc96bd6 total_len=348 >> reg0=0x28b,reg11=0x9,reg12=0xa,reg13=0x87,reg14=0x4b,metadata=0x2,in_port=131 >> (via action) data_len=348 (unbuffered) >> 2025-02-18T21:19:52.172Z|1116417|vconn|DBG|unix#0: received: NXT_RESUME >> (OF1.5) (xid=0x0): cookie=0xfded3de0 total_len=348 >> reg0=0x308,reg11=0x9,reg12=0xa,reg13=0xc5,reg14=0x354,metadata=0x2,in_port=373 >> (via action) data_len=348 (unbuffered) >> 2025-02-18T21:20:24.572Z|1117078|vconn|DBG|unix#0: received: NXT_RESUME >> (OF1.5) (xid=0x0): cookie=0x5bc96bd6 total_len=348 >> reg0=0x28b,reg11=0x9,reg12=0xa,reg13=0x8b,reg14=0x2fb,metadata=0x2,in_port=135 >> (via action) data_len=348 (unbuffered) >> 2025-02-18T21:21:31.672Z|1118459|vconn|DBG|unix#0: received: NXT_RESUME >> (OF1.5) (xid=0x0): cookie=0x5bc96bd6 total_len=348 >> reg0=0x28b,reg11=0x9,reg12=0xa,reg13=0x93,reg14=0xdd,metadata=0x2,in_port=143 >> (via action) data_len=348 (unbuffered) >> >> The interesting logs is also at ovn-controller logs >> 2025-02-18T21:21:10.152Z|1059900|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER >> fa:16:3e:a6:57:54 13.37.13.68 >> 2025-02-18T21:21:10.560Z|1059909|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER >> fa:16:3e:43:d4:9a 13.38.13.15 >> 2025-02-18T21:21:14.178Z|1059983|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER >> fa:16:3e:cb:77:8d 13.37.13.72 >> 2025-02-18T21:21:16.785Z|1060037|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER >> fa:16:3e:cc:d3:66 13.38.13.62 >> 2025-02-18T21:21:19.661Z|1060097|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER >> fa:16:3e:7e:1b:a8 13.37.13.173 >> 2025-02-18T21:21:22.582Z|1060157|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER >> fa:16:3e:cc:d3:66 13.38.13.62 >> 2025-02-18T21:21:25.903Z|1060225|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER >> fa:16:3e:f6:ae:8b 13.37.13.11 >> 2025-02-18T21:21:28.643Z|1060281|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER >> fa:16:3e:b0:98:e0 13.37.13.101 >> 2025-02-18T21:21:32.687Z|1060361|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER >> fa:16:3e:82:6e:c3 13.37.13.183 >> 2025-02-18T21:21:34.517Z|1060398|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER >> fa:16:3e:cb:08:c7 13.37.13.55 >> 2025-02-18T21:21:38.673Z|1060484|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER >> fa:16:3e:f9:5b:25 13.37.13.98 >> 2025-02-18T21:21:40.782Z|1060528|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER >> fa:16:3e:cc:d3:66 13.38.13.62 >> 2025-02-18T21:21:44.097Z|1060596|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER >> fa:16:3e:95:0c:5e 13.37.13.197 >> 2025-02-18T21:21:46.484Z|1060646|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER >> fa:16:3e:cc:d3:66 13.38.13.62 >> 2025-02-18T21:21:50.203Z|1060722|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER >> fa:16:3e:92:f8:ba 13.37.13.124 >> 2025-02-18T21:21:52.764Z|1060776|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER >> fa:16:3e:ce:b8:1e 13.37.13.126 >> 2025-02-18T21:21:55.503Z|1060832|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER >> fa:16:3e:8d:88:f6 13.38.13.63 >> 2025-02-18T21:21:59.520Z|1060914|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER >> fa:16:3e:8f:20:53 13.37.13.63 >> 2025-02-18T21:22:01.620Z|1060958|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER >> fa:16:3e:8a:7a:47 13.37.13.150 >> 2025-02-18T21:22:04.418Z|1061015|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER >> fa:16:3e:70:7e:5f 13.37.13.216 >> 2025-02-18T21:22:07.477Z|1061079|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER >> fa:16:3e:2a:7a:ad 13.37.13.50 >> 2025-02-18T21:22:11.719Z|1061165|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER >> fa:16:3e:cc:d3:66 13.38.13.62 >> 2025-02-18T21:22:13.798Z|1061208|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER >> fa:16:3e:a6:57:54 13.37.13.68 >> 2025-02-18T21:22:16.612Z|1061266|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER >> fa:16:3e:81:5f:99 13.37.13.14 >> 2025-02-18T21:22:19.830Z|1061334|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER >> fa:16:3e:cc:d3:66 13.38.13.62 >> 2025-02-18T21:22:22.460Z|1061388|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER >> fa:16:3e:cc:d3:66 13.38.13.62 >> 2025-02-18T21:22:26.001Z|1061460|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER >> fa:16:3e:cc:d3:66 13.38.13.62 >> 2025-02-18T21:22:28.383Z|1061506|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER >> fa:16:3e:cc:d3:66 13.38.13.62 >> 2025-02-18T21:22:31.594Z|1061572|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER >> fa:16:3e:58:35:35 13.37.13.64 >> 2025-02-18T21:22:35.249Z|1061648|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER >> fa:16:3e:cc:d3:66 13.38.13.62 >> 2025-02-18T21:22:37.578Z|1061695|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER >> fa:16:3e:82:6e:c3 13.37.13.183 >> 2025-02-18T21:22:40.514Z|1061755|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER >> fa:16:3e:cc:d3:66 13.38.13.62 >> 2025-02-18T21:22:43.373Z|1061815|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER >> fa:16:3e:58:72:c4 13.37.13.65 >> 2025-02-18T21:22:46.557Z|1061881|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER >> fa:16:3e:02:5e:ac 13.37.13.111 >> 2025-02-18T21:22:50.020Z|1061951|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER >> fa:16:3e:cc:d3:66 13.38.13.62 >> 2025-02-18T21:22:53.511Z|1062023|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER >> fa:16:3e:5f:fd:6f 13.37.13.15 >> 2025-02-18T21:22:55.717Z|1062069|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER >> fa:16:3e:cc:d3:66 13.38.13.62 >> 2025-02-18T21:22:58.812Z|1062133|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER >> fa:16:3e:8d:88:f6 13.38.13.63 >> 2025-02-18T21:23:01.450Z|1062189|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER >> fa:16:3e:9a:da:47 13.37.13.225 >> 2025-02-18T21:23:05.545Z|1062273|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER >> fa:16:3e:8a:7a:47 13.37.13.150 >> 2025-02-18T21:23:08.512Z|1062332|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER >> fa:16:3e:70:7e:5f 13.37.13.216 >> 2025-02-18T21:23:10.980Z|1062384|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER >> fa:16:3e:02:ff:8a 13.37.13.233 >> 2025-02-18T21:23:14.405Z|1062454|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER >> fa:16:3e:cc:d3:66 13.38.13.62 >> 2025-02-18T21:23:17.172Z|1062511|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER >> fa:16:3e:2e:73:8f 13.37.13.186 >> 2025-02-18T21:23:19.435Z|1062559|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER >> fa:16:3e:39:8d:2f 13.37.13.153 >> 2025-02-18T21:23:23.049Z|1062633|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER >> fa:16:3e:cc:d3:66 13.38.13.62 >> 2025-02-18T21:23:25.653Z|1062687|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER >> fa:16:3e:d4:36:c6 13.37.13.60 >> 2025-02-18T21:23:28.768Z|1062749|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER >> fa:16:3e:f1:64:3a 13.37.13.125 >> 2025-02-18T21:23:33.340Z|1062843|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER >> fa:16:3e:56:c4:04 13.37.13.219 >> 2025-02-18T21:23:34.400Z|1062865|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER >> fa:16:3e:6a:7b:a4 13.37.13.121 >> 2025-02-18T21:23:39.485Z|1062968|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER >> fa:16:3e:cc:d3:66 13.38.13.62 >> 2025-02-18T21:23:41.393Z|1063007|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER >> fa:16:3e:82:6e:c3 13.37.13.183 >> 2025-02-18T21:23:44.410Z|1063069|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER >> fa:16:3e:52:ef:94 13.37.13.182 >> >> IPs are sanitized, anyway 13.37.13 belongs to subnet #1 and 13.38.13 to >> subnet #2. Observing large regression across all infra, like almost 50% >> overall network bandwith loss over time. >> >> Any thoughts guys? >> > Do you see any warning/error logs in ovs-vswitchd.log? > >> Context: we run a lot of small instances with 1 x fixed IP with little >> amount of external traffic, about 300 per hypervisor, sometimes more, >> sometimes less anyway didn't see anything similar with plain ovs and >> neutron-openvswitch-agent (openstack). >> UPD: Once i manually cleared dhcpv4_options column in all ports seems >> like neutron-dhcp-agent (dnsmasq based) started to send dhcpoffers and >> most of instances back to normal. However, didn't manage what is the >> reason to ovn-controller not send DHCPOFFER to VIF. Seems like a big >> point of regression starting from N number of instances or something else. >> > Just double checking, you're saying that with low numbers of instances > you don't have this issue? > > I wonder if you're not potentially hitting the maximum resubmit limit > (4096 resubmits) due to DHCPOFFERS being sent as broadcast. There > should be warning logs indicating that in ovs-vswitchd. > > It might also help if you could share the output of: > > ovs-appctl coverage/show > >>> 10.02.2025 17:36, Numan Siddique: >>>> >>>> On Sat, Feb 8, 2025 at 1:27 PM support via discuss <ovs- >>>> disc...@openvswitch.org> wrote: >>>> >>>> I'm observing for a long time next bug: ovn-controller doesn't >>>> send DHCPOFFER but it logs that it does. However this VIF doesn't >>>> get it at all. >>>> But once ovn-controller restarted it successfully seems like it >>>> actually send DHCPOFFER and immediately receives DHCPACK from VIF. >>>> >>>> >>>> When you see this issue, can you enable vconn and rcon debug logs in >>>> both ovn-controller and ovs-vswitchd ? >>>> >>>> ovn-appctl vlog/set rconn:dbg >>>> ovn-appctl vlog/set vconn:dbg >>>> >>>> ovs-appctl vlog/set rconn:dbg >>>> ovs-appctl vlog/set rconn:dbg >>>> >>>> If ovn-controller replies with a DHCPOFFER, you should see the >>>> NXT_RESUME related logs in both ovn-controller and ovs-vswitchd. >>>> >>>> This can help you narrow down the issue. >>>> >>>> Thanks >>>> Numan >>>> >>>> |2025-02-08T14:27:57.462Z|00053|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER >>>> fa:16:3e:80:67:62 13.37.0.5 >>>> 2025-02-08T14:28:01.746Z|00054|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER >>>> fa:16:3e:80:67:62 13.37.0.5 >>>> 2025-02-08T14:28:05.582Z|00055|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER >>>> fa:16:3e:80:67:62 13.37.0.5 >>>> 2025-02-08T14:28:13.053Z|00056|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER >>>> fa:16:3e:80:67:62 13.37.0.5 >>>> 2025-02-08T14:28:21.972Z|00057|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER >>>> fa:16:3e:84:8e:7f 13.37.0.37 >>>> 2025-02-08T14:28:25.109Z|00058|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER >>>> fa:16:3e:84:8e:7f 13.37.0.37 >>>> 2025-02-08T14:28:29.362Z|00059|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER >>>> fa:16:3e:84:8e:7f 13.37.0.37 >>>> 2025-02-08T14:28:30.003Z|00060|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER >>>> fa:16:3e:80:67:62 13.37.0.5 >>>> 2025-02-08T14:28:37.820Z|00061|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER >>>> fa:16:3e:84:8e:7f 13.37.0.37 >>>> 2025-02-08T14:28:38.619Z|00002|fatal_signal(ovn_statctrl2)|WARN| >>>> terminating >>>> with signal 15 (Terminated) # ovn_controller is restarted by >>>> docker restart ovn_controller >>>> 2025-02-08T14:28:50.639Z|00001|vlog|INFO|opened log file >>>> /var/log/kolla/openvswitch/ovn-controller.log >>>> 2025-02-08T14:28:50.640Z|00002|vif_plug_representor|INFO|unable >>>> to set udev receive buffer size: Operation not permitted >>>> 2025-02-08T14:28:50.641Z|00003|reconnect|INFO|tcp:127.0.0.1:6640 >>>> <http://127.0.0.1:6640>: connecting... >>>> 2025-02-08T14:28:50.641Z|00004|reconnect|INFO|tcp:127.0.0.1:6640 >>>> <http://127.0.0.1:6640>: connected >>>> 2025-02-08T14:28:50.738Z|00005|main|INFO|OVN internal version is >>>> : [24.09.0-20.37.0-73.6] >>>> 2025-02-08T14:28:50.738Z|00006|main|INFO|OVS IDL reconnected, >>>> force recompute. >>>> 2025-02-08T14:28:50.738Z|00007|reconnect|INFO|tcp:10.10.9.5:6642 >>>> <http://10.10.9.5:6642>: connecting... >>>> 2025-02-08T14:28:50.738Z|00008|main|INFO|OVNSB IDL reconnected, >>>> force recompute. >>>> 2025-02-08T14:28:50.739Z|00009|ovn_util|INFO|statctrl: connecting >>>> to switch: "unix:/var/run/openvswitch/br-int.mgmt" >>>> 2025-02-08T14:28:50.739Z|00010|rconn|INFO|unix:/var/run/ >>>> openvswitch/br-int.mgmt: >>>> connecting... >>>> 2025-02-08T14:28:50.739Z|00011|ovn_util|INFO|pinctrl: connecting >>>> to switch: "unix:/var/run/openvswitch/br-int.mgmt" >>>> 2025-02-08T14:28:50.739Z|00012|rconn|INFO|unix:/var/run/ >>>> openvswitch/br-int.mgmt: >>>> connecting... >>>> 2025-02-08T14:28:50.739Z|00013|reconnect|INFO|tcp:10.10.9.5:6642 >>>> <http://10.10.9.5:6642>: connected >>>> 2025-02-08T14:28:50.739Z|00001|rconn(ovn_statctrl2)|INFO|unix:/ >>>> var/run/openvswitch/br-int.mgmt: >>>> connected >>>> 2025-02-08T14:28:50.743Z|00001|rconn(ovn_pinctrl0)|INFO|unix:/ >>>> var/run/openvswitch/br-int.mgmt: >>>> connected 2025-02-08T14:28:50.756Z|00014|ovn_util|INFO|features: >>>> connecting to switch: "unix:/var/run/openvswitch/br-int.mgmt" >>>> 2025-02-08T14:28:50.757Z|00015|rconn|INFO|unix:/var/run/ >>>> openvswitch/br-int.mgmt: >>>> connecting... 2025-02-08T14:28:50.757Z|00016|features|INFO|OVS >>>> Feature: ct_zero_snat, state: supported >>>> 2025-02-08T14:28:50.757Z|00017|features|INFO|OVS Feature: >>>> ct_flush, state: supported >>>> 2025-02-08T14:28:50.757Z|00018|features|INFO|OVS Feature: >>>> dp_hash_l4_sym_support, state: supported >>>> 2025-02-08T14:28:50.757Z|00019|main|INFO|OVS feature set changed, >>>> force recompute. >>>> 2025-02-08T14:28:50.757Z|00020|ovn_util|INFO|ofctrl: connecting >>>> to switch: "unix:/var/run/openvswitch/br-int.mgmt" >>>> 2025-02-08T14:28:50.757Z|00021|rconn|INFO|unix:/var/run/ >>>> openvswitch/br-int.mgmt: >>>> connecting... >>>> 2025-02-08T14:28:50.758Z|00022|rconn|INFO|unix:/var/run/ >>>> openvswitch/br-int.mgmt: >>>> connected 2025-02-08T14:28:50.758Z|00023|main|INFO|OVS OpenFlow >>>> connection reconnected,force recompute. >>>> 2025-02-08T14:28:50.759Z|00024|rconn|INFO|unix:/var/run/ >>>> openvswitch/br-int.mgmt: >>>> connected 2025-02-08T14:28:50.760Z|00025|features|INFO|OVS >>>> Feature: meter_support, state: supported >>>> 2025-02-08T14:28:50.760Z|00026|features|INFO|OVS Feature: >>>> group_support, state: supported >>>> 2025-02-08T14:28:50.760Z|00027|features|INFO|OVS Feature: >>>> sample_action_with_registers, state: supported >>>> 2025-02-08T14:28:50.760Z|00028|main|INFO|OVS feature set changed, >>>> force recompute. >>>> 2025-02-08T14:28:51.483Z|00029|binding|INFO|Releasing lport >>>> 978a923d-1076-484b-a1d5-ff5827ae69ec from this chassis >>>> (sb_readonly=0) >>>> 2025-02-08T14:28:53.658Z|00030|poll_loop|INFO|wakeup due to >>>> [POLLIN] on fd 30 (10.10.9.3:60902 >>>> <http://10.10.9.3:60902><->10.10.9.5:6642 >>>> <http://10.10.9.5:6642>) at lib/stream-fd.c:157 (78% CPU usage) >>>> 2025-02-08T14:28:54.726Z|00002|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER >>>> fa:16:3e:84:8e:7f 13.37.0.37 >>>> 2025-02-08T14:28:59.134Z|00003|pinctrl(ovn_pinctrl0)|INFO|DHCPACK >>>> fa:16:3e:84:8e:7f 13.37.0.37 >>>> 2025-02-08T14:29:01.499Z|00004|pinctrl(ovn_pinctrl0)|INFO|DHCPOFFER >>>> fa:16:3e:80:67:62 13.37.0.5 >>>> 2025-02-08T14:29:03.245Z|00005|pinctrl(ovn_pinctrl0)|INFO|DHCPACK >>>> fa:16:3e:84:8e:7f 13.37.0.37 >>>> 2025-02-08T14:29:03.455Z|00031|memory|INFO|98744 kB peak resident >>>> set size after 12.8 seconds >>>> 2025-02-08T14:29:03.455Z|00032|memory|INFO|idl-cells- >>>> OVN_Southbound:168023 >>>> idl-cells-Open_vSwitch:36874 lflow-cache-entries-cache-expr:6494 >>>> lflow-cache-entries-cache-matches:1413 lflow-cache-size-KB:20726 >>>> local_datapath_usage-KB:1 ofctrl_desired_flow_usage-KB:6342 >>>> ofctrl_installed_flow_usage-KB:4695 ofctrl_sb_flow_ref_usage- >>>> KB:2355 | >>>> >>>> Any suggestions? >>>> >>>> Environment: >>>> OpenStack 2024.2 >>>> Kernel 6.8.0-50-generic #51 >>>> <https://github.com/ovn-org/ovn/issues/51>~22.04.1-Ubuntu SMP >>>> PREEMPT_DYNAMIC >>>> >>>> |ovs-vsctl (Open vSwitch) 3.4.0 DB Schema 8.7.0 | >>>> >>>> |ovs-ofctl (Open vSwitch) 3.4.0 OpenFlow versions 0x1:0x6 | >>>> >>>> |# ovn-nbctl --version ovn-nbctl 24.09.0 Open vSwitch Library >>>> 3.4.0 DB Schema 7.6.0 | >>>> >>>> |# ovn-sbctl --version ovn-sbctl 24.09.0 Open vSwitch Library >>>> 3.4.0 DB Schema 20.37.0| >>>> > Thanks, > Dumitru >
_______________________________________________ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss