On 3/3/25 5:08 PM, support wrote: >> Hmm, why would a router forward L2 broadcast packets though (in your > case they're not ARPs as the logs in your gist showed)? > > I guess while Juniper switch acts as border router with BGP configured > (for public subnets). So we have L3 vlan interface with IP addresses > like 13.37.0.1/24 and then it learns IP to mac ARP when packets comes > from uplink. > > >> In any case, would it be possible to patch the OVN version you're >>running and revert >>https://github.com/ovn-org/ovn/ > commit/325c7b203d8bfd12bc1285ad11390c1a55cd6717 > <https://github.com/ovn-org/ovn/commit/325c7b203d8bfd12bc1285ad11390c1a55cd6717> > ? > Easy-peasy! Should i just rollback to this OVN version or re-build container > with editing all mentioned files in this commit (not sure about backward > compatibility)? >
If it helps, I did the revert in my fork (on top of 24.09 latest) here: https://github.com/dceara/ovn/commits/refs/heads/tmp-branch-24.09-revert-mg-split/ NOTE: this is just for testing, there's no guarantee that things won't break. The only thing I did was resolve the conflicts and compile it. I didn't run any test suite. > >> On 3/3/25 4:35 PM, support via discuss wrote: >> >> What surprises me though is that they actually need to be sent to >> >>reg15=0x8000 (MC_FLOOD). >> >> In your gist src-mac 30:b6:4f:5f:db:a0 is by far the most common, would >> >> you happen to know who that corresponds to? Is it legit IP traffic >> with >> >>L2 broadcast destination? >> > 30:b6:4f:5f:db:a0 mac is Juniper switch used as border router. >> > Completely legit. Similar to quote by Piotr Misiak in related topic. >> > >> >> We are receiving a lot of packets to non used (non provisioned) > >> destination IP addresses, I guess some bots scanning Internet. > This >> > creates a lot of ARP requests which cannot be replied, because > those >> > IP addresses are not configured anywhere yet. >> > >> >> Hmm, why would a router forward L2 broadcast packets though (in your >> case they're not ARPs as the logs in your gist showed)? >> >> In any case, would it be possible to patch the OVN version you're >> running and revert >> https://github.com/ovn-org/ovn/ >> commit/325c7b203d8bfd12bc1285ad11390c1a55cd6717 >> ? >> >> > >> > >> > >> >> On 3/3/25 3:53 PM, support wrote: >> >> > Sorry for mispelling Dumitru! Y is damn close to U, didn't get use to >> >> > new keyboard yet. >> >> >> >> No worries! :) >> >> >> >> > Little clarification: >> >> > We making instances public through L2 network. Meanwhile we also >> have 1 >> >> > router (OpenStack entity) for private network where kubernetes works >> >> > with IP attached from public network to let kubernetes nodes have >> access >> >> > to internet. >> >> > >> >> >> Hi Dumitry, >> >> >> Thank you for you reply. Just tried to change broadcast-arps-to-all- >> >> >> routers to false. It really helped for about some seconds (i was >> able >> >> >> to request IP address using dhclient immediately) then it returned >> >> >> back to the same behaviour and ovs-vswitchd having CPU overload. >> >> >> node1: >> >> >> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ >> >> >> COMMAND >> >> >> 139416 root 20 0 8390536 521684 8448 R 98.7 0.1 >> 2353:23 >> >> >> ovs-vswitchd >> >> >> 139511 root 20 0 8390536 521684 8448 R 11.3 0.1 >> 143:54.62 >> >> >> revalidator92 >> >> >> 139523 root 20 0 8390536 521684 8448 R 11.3 0.1 >> 126:49.06 >> >> >> revalidator104 >> >> >> 139528 root 20 0 8390536 521684 8448 R 10.7 0.1 >> 133:27.08 >> >> >> revalidator109 >> >> >> 139510 root 20 0 8390536 521684 8448 R 10.3 0.1 >> 126:13.46 >> >> >> revalidator91 >> >> >> 139519 root 20 0 8390536 521684 8448 R 10.0 0.1 >> 143:28.01 >> >> >> revalidator100 >> >> >> 139522 root 20 0 8390536 521684 8448 R 10.0 0.1 >> 91:53.11 >> >> >> revalidator103 >> >> >> 139526 root 20 0 8390536 521684 8448 R 10.0 0.1 >> 134:08.83 >> >> >> revalidator107 >> >> >> 139531 root 20 0 8390536 521684 8448 S 10.0 0.1 >> 99:15.78 >> >> >> revalidator112 >> >> >> 139527 root 20 0 8390536 521684 8448 R 9.7 0.1 >> 40:55.66 >> >> >> revalidator108 >> >> >> 139516 root 20 0 8390536 521684 8448 R 9.3 0.1 >> 94:20.53 >> >> >> revalidator97 >> >> >> 139509 root 20 0 8390536 521684 8448 S 8.0 0.1 >> 194:47.10 >> >> >> revalidator90 >> >> >> 139517 root 20 0 8390536 521684 8448 S 5.3 0.1 >> 48:20.00 >> >> >> revalidator98 >> >> >> 139520 root 20 0 8390536 521684 8448 R 5.3 0.1 >> 126:20.74 >> >> >> revalidator101 >> >> >> 139497 root 20 0 8390536 521684 8448 S 5.0 0.1 >> 143:10.67 >> >> >> handler78 >> >> >> 139513 root 20 0 8390536 521684 8448 S 5.0 0.1 >> 93:14.85 >> >> >> revalidator94 >> >> >> 139514 root 20 0 8390536 521684 8448 R 5.0 0.1 >> 42:13.77 >> >> >> revalidator95 >> >> >> 139521 root 20 0 8390536 521684 8448 S 5.0 0.1 >> 92:28.66 >> >> >> revalidator102 >> >> >> 139518 root 20 0 8390536 521684 8448 R 3.7 0.1 >> 14:07.25 >> >> >> revalidator99 >> >> >> >> >> >> node2: >> >> >> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ >> >> >> COMMAND >> >> >> 3255815 root 20 0 8383004 342144 8448 R 99.9 0.0 >> 996:44.31 >> >> >> ovs-vswitchd >> >> >> 3255922 root 20 0 8383004 342144 8448 S 9.6 0.0 >> 47:23.99 >> >> >> revalidator102 >> >> >> 3255927 root 20 0 8383004 342144 8448 S 9.3 0.0 >> 64:20.30 >> >> >> revalidator107 >> >> >> 3255929 root 20 0 8383004 342144 8448 S 9.3 0.0 >> 64:32.72 >> >> >> revalidator109 >> >> >> 3255920 root 20 0 8383004 342144 8448 S 8.3 0.0 >> 75:16.18 >> >> >> revalidator100 >> >> >> 3255921 root 20 0 8383004 342144 8448 S 8.3 0.0 >> 65:45.10 >> >> >> revalidator101 >> >> >> 3255923 root 20 0 8383004 342144 8448 S 8.3 0.0 >> 34:15.98 >> >> >> revalidator103 >> >> >> 3255924 root 20 0 8383004 342144 8448 S 8.3 0.0 >> 65:31.02 >> >> >> revalidator104 >> >> >> 3255910 root 20 0 8383004 342144 8448 S 8.0 0.0 >> 93:24.25 >> >> >> revalidator90 >> >> >> 3255912 root 20 0 8383004 342144 8448 S 8.0 0.0 >> 74:40.90 >> >> >> revalidator92 >> >> >> 3255926 root 20 0 8383004 342144 8448 S 8.0 0.0 >> 46:46.51 >> >> >> revalidator106 >> >> >> 3255914 root 20 0 8383004 342144 8448 S 7.6 0.0 >> 47:15.57 >> >> >> revalidator94 >> >> >> 3255917 root 20 0 8383004 342144 8448 S 5.6 0.0 >> 33:24.94 >> >> >> revalidator97 >> >> >> 3255918 root 20 0 8383004 342144 8448 S 4.3 0.0 >> 24:15.37 >> >> >> revalidator98 >> >> >> 3255911 root 20 0 8383004 342144 8448 S 4.0 0.0 >> 65:40.29 >> >> >> revalidator91 >> >> >> 3255858 root 20 0 8383004 342144 8448 S 3.3 0.0 >> 48:44.56 >> >> >> handler38 >> >> >> 3255897 root 20 0 8383004 342144 8448 S 1.7 0.0 >> 0:37.98 >> >> >> handler77 >> >> >> 3255828 root 20 0 8383004 342144 8448 S 1.3 0.0 >> 11:48.57 >> >> >> urcu8 >> >> >> 3255884 root 20 0 8383004 342144 8448 S 0.7 0.0 >> 2:17.81 >> >> >> handler64 >> >> >> 3255913 root 20 0 8383004 342144 8448 S 0.7 0.0 >> 65:57.06 >> >> >> revalidator93 >> >> >> 3255916 root 20 0 8383004 342144 8448 S 0.7 0.0 >> 24:13.42 >> >> >> revalidator96 >> >> >> >> >> >> Yep, there is a lot of logs of pinctrl received packet-in. >> >> >> 2025-03-03T14:23:46.919Z|49223|pinctrl(ovn_pinctrl0)|DBG|pinctrl >> >> received packet-in | opcode=unrecognized(27)| OF_Table_ID=43| >> >> OF_Cookie_ID=0x435b65d6| in-port=4| src-mac=30:b6:4f:5f:db:a0, dst- >> >> mac=ff:ff:ff:ff:ff:ff| src-ip=83.219.243.1, dst-ip=83.219.243.143 >> >> >> 2025-03-03T14:23:46.919Z|49224|pinctrl(ovn_pinctrl0)|DBG|pinctrl >> >> received packet-in | opcode=unrecognized(27)| OF_Table_ID=43| >> >> OF_Cookie_ID=0x435b65d6| in-port=4| src-mac=30:b6:4f:5f:db:a0, dst- >> >> mac=ff:ff:ff:ff:ff:ff| src-ip=83.219.243.1, dst-ip=83.219.243.57 >> >> >> 2025-03-03T14:23:46.919Z|49225|pinctrl(ovn_pinctrl0)|DBG|pinctrl >> >> received packet-in | opcode=unrecognized(27)| OF_Table_ID=43| >> >> OF_Cookie_ID=0x435b65d6| in-port=4| src-mac=30:b6:4f:5f:db:a0, dst- >> >> mac=ff:ff:ff:ff:ff:ff| src-ip=109.206.241.1, dst-ip=109.206.241.187 >> >> >> 2025-03-03T14:23:46.919Z|49226|pinctrl(ovn_pinctrl0)|DBG|pinctrl >> >> received packet-in | opcode=unrecognized(27)| OF_Table_ID=43| >> >> OF_Cookie_ID=0x435b65d6| in-port=4| src-mac=30:b6:4f:5f:db:a0, dst- >> >> mac=ff:ff:ff:ff:ff:ff| src-ip=83.219.243.1, dst-ip=83.219.243.193 >> >> >> 2025-03-03T14:23:46.919Z|49227|pinctrl(ovn_pinctrl0)|DBG|pinctrl >> >> received packet-in | opcode=unrecognized(27)| OF_Table_ID=43| >> >> OF_Cookie_ID=0x435b65d6| in-port=4| src-mac=30:b6:4f:5f:db:a0, dst- >> >> mac=ff:ff:ff:ff:ff:ff| src-ip=83.219.243.1, dst-ip=83.219.243.154 >> >> >> 2025-03-03T14:23:46.919Z|49228|pinctrl(ovn_pinctrl0)|DBG|pinctrl >> >> received packet-in | opcode=unrecognized(27)| OF_Table_ID=43| >> >> OF_Cookie_ID=0x435b65d6| in-port=65533| src-mac=30:b6:4f:5f:db:a0, >> >> dst-mac=ff:ff:ff:ff:ff:ff| src-ip=83.219.243.1, dst-ip=83.219.243.126 >> >> >> 2025-03-03T14:23:46.919Z|49229|pinctrl(ovn_pinctrl0)|DBG|pinctrl >> >> received packet-in | opcode=unrecognized(27)| OF_Table_ID=43| >> >> OF_Cookie_ID=0x435b65d6| in-port=4| src-mac=30:b6:4f:5f:db:a0, dst- >> >> mac=ff:ff:ff:ff:ff:ff| src-ip=89.169.15.1, dst-ip=89.169.15.238 >> >> >> 2025-03-03T14:23:46.919Z|49230|pinctrl(ovn_pinctrl0)|DBG|pinctrl >> >> received packet-in | opcode=unrecognized(27)| OF_Table_ID=43| >> >> OF_Cookie_ID=0x435b65d6| in-port=65533| src-mac=30:b6:4f:5f:db:a0, >> >> dst-mac=ff:ff:ff:ff:ff:ff| src-ip=83.217.210.1, dst-ip=83.217.210.148 >> >> >> 2025-03-03T14:23:46.919Z|49231|pinctrl(ovn_pinctrl0)|DBG|pinctrl >> >> received packet-in | opcode=unrecognized(27)| OF_Table_ID=43| >> >> OF_Cookie_ID=0x435b65d6| in-port=4| src-mac=30:b6:4f:5f:db:a0, dst- >> >> mac=ff:ff:ff:ff:ff:ff| src-ip=83.219.243.1, dst-ip=83.219.243.214 >> >> >> 2025-03-03T14:23:46.919Z|49232|pinctrl(ovn_pinctrl0)|DBG|pinctrl >> >> received packet-in | opcode=unrecognized(27)| OF_Table_ID=43| >> >> OF_Cookie_ID=0x435b65d6| in-port=65533| src-mac=30:b6:4f:5f:db:a0, >> >> dst-mac=ff:ff:ff:ff:ff:ff| src-ip=83.219.243.1, dst-ip=83.219.243.146 >> >> >> 2025-03-03T14:23:46.919Z|49233|pinctrl(ovn_pinctrl0)|DBG|pinctrl >> >> received packet-in | opcode=unrecognized(27)| OF_Table_ID=43| >> >> OF_Cookie_ID=0x435b65d6| in-port=4| src-mac=30:b6:4f:5f:db:a0, dst- >> >> mac=ff:ff:ff:ff:ff:ff| src-ip=89.169.15.1, dst-ip=89.169.15.28 >> >> >> >> >> >> >> Right, that explains why setting the option to not broadcast arp >> >> requests doesn't do much in your case. These are IP packets with L2 >> >> broadcast destination IPs. But being L2 broadcast they seem to hit the >> >> same mc_group split rules as I suspected in my previous email. >> >> >> >> >> These are just few lines, more is available in gist to not dump it >> >> all there. >> >> >> https://gist.githubusercontent.com/frct1/ >> >> >> b6baebfb12a6132d8c5597024049f88f/raw/ >> >> >> dabd26c2e2a7bf467717d6ee034926ce5f71d273/pinctrl%2520logs.txt >> >> >> >> >> >> >> What surprises me though is that they actually need to be sent to >> >> reg15=0x8000 (MC_FLOOD). >> >> >> >> In your gist src-mac 30:b6:4f:5f:db:a0 is by far the most common, would >> >> you happen to know who that corresponds to? Is it legit IP traffic with >> >> L2 broadcast destination? >> >> >> >> >> > I'm guessing your topology might be similar (a "large" public >> logical >> >> >> > switch with quite a few connected logical routers). >> >> >> Exactly! There is vlan network with about 1100 ports in total using >> >> >> just L2 connections with router that acts as gateway for kubernetes >> >> >> cluster. >> >> >>> Hi Ilia, >> >> >>> >> >> >>> Thanks for all the additional information! >> >> >>> >> >> >>> On 3/2/25 8:55 PM, support via discuss wrote: >> >> >>> > Probably i found something interesting here. >> >> >>> > top -H -p $(pidof ovs-vswitchd) >> >> >>> > PID USER PR NI VIRT RES SHR S %CPU %MEM >> >> TIME+ >> >> >>> > COMMAND >> >> >>> > 673931 root 20 0 8392700 462408 7040 R 99.9 0.1 >> >> 1216:19 >> >> >>> > ovs-vswitchd >> >> >>> > 673941 root 20 0 8392700 462408 7040 S 6.7 0.1 >> >> 4:18.00 >> >> >>> > handler10 >> >> >>> > 673982 root 20 0 8392700 462408 7040 S 6.7 0.1 >> >> 0:49.26 >> >> >>> > handler51 >> >> >>> > 674009 root 20 0 8392700 462408 7040 S 6.7 0.1 >> >> 73:17.17 >> >> >>> > handler78 >> >> >>> > 673932 root 20 0 8392700 462408 7040 S 0.0 0.1 >> >> 2:07.06 >> >> >>> > handler1 >> >> >>> > 673933 root 20 0 8392700 462408 7040 S 0.0 0.1 >> >> 0:34.96 >> >> >>> > handler2 >> >> >>> > 673934 root 20 0 8392700 462408 7040 S 0.0 0.1 >> >> 3:57.85 >> >> >>> > handler3 >> >> >>> > 673935 root 20 0 8392700 462408 7040 S 0.0 0.1 >> >> 0:41.66 >> >> >>> > handler4 >> >> >>> > 673936 root 20 0 8392700 462408 7040 S 0.0 0.1 >> >> 18:46.97 >> >> >>> > urcu5 >> >> >>> > 673937 root 20 0 8392700 462408 7040 S 0.0 0.1 >> >> 4:04.14 >> >> >>> > handler6 >> >> >>> > 673938 root 20 0 8392700 462408 7040 S 0.0 0.1 >> >> 0:44.03 >> >> >>> > handler7 >> >> >>> > 673939 root 20 0 8392700 462408 7040 S 0.0 0.1 >> >> 4:20.97 >> >> >>> > handler8 >> >> >>> > >> >> >>> > I guess this could be a real issue of lost DHCP packets. >> >> >>> > Next are top lines of perf record >> >> >>> > 1.19% ovs-vswitchd libc.so.6 [.] realloc >> >> >> >> >>> > >> >> >>> > 0.64% revalidator90 ovs-vswitchd [.] >> >> >>> 0x00000000000797de >> >> >>> > >> >> >>> > 0.59% ovs-vswitchd ovs-vswitchd [.] >> >> >>> 0x00000000000487de >> >> >>> > >> >> >>> > 0.58% ovs-vswitchd ovs-vswitchd [.] >> >> >>> 0x00000000000797de >> >> >>> > >> >> >>> > 0.58% ovs-vswitchd [kernel.kallsyms] [k] >> >> >>> available_idle_cpu >> >> >>> > >> >> >>> > 0.57% ovs-vswitchd [kernel.kallsyms] [k] >> tun_net_xmit >> >> >>> >> >> >>> > >> >> >>> > 0.55% ovs-vswitchd ovs-vswitchd [.] >> >> >>> 0x00000000000a561a >> >> >>> > >> >> >>> > 0.51% ovs-vswitchd libc.so.6 [.] >> >> >>> 0x0000000000191747 >> >> >>> > >> >> >>> > 0.50% ovs-vswitchd [kernel.kallsyms] [k] task_h_load >> >> >> >> >>> > >> >> >>> > 0.48% ovs-vswitchd [kernel.kallsyms] [k] >> >> llist_add_batch >> >> >>> > >> >> >>> > 0.48% revalidator90 ovs-vswitchd [.] >> >> >>> 0x00000000000797ca >> >> >>> > >> >> >>> > 0.47% revalidator96 ovs-vswitchd [.] >> >> >>> 0x00000000000797de >> >> >>> > >> >> >>> > 0.45% revalidator90 ovs-vswitchd [.] >> >> >>> 0x00000000000487de >> >> >>> > >> >> >>> > 0.44% revalidator93 ovs-vswitchd [.] >> >> >>> 0x00000000000487de >> >> >>> > >> >> >>> > 0.43% revalidator93 ovs-vswitchd [.] >> >> >>> 0x00000000000797de >> >> >>> > >> >> >>> > 0.40% ovs-vswitchd ovs-vswitchd [.] >> >> >>> 0x00000000000797ca >> >> >>> > >> >> >>> > 0.40% revalidator112 ovs-vswitchd [.] >> >> >>> 0x00000000000797de >> >> >>> > >> >> >>> > 0.39% ovs-vswitchd [kernel.kallsyms] [k] __dev_queue_xmit >> >> >>> > >> >> >>> > I've definitely saw bug reports regarding libc and openvswitch. >> >> >>> > >> >> >>> > >> >> >>> >> I'm back to digging into this more deep. >> >> >>> >> PID USER PR NI VIRT RES SHR S %CPU %MEM >> >> TIME+ >> >> >>> >> COMMAND >> >> >>> >> 4446 root 20 0 395500 88276 6336 S 1.0 0.0 >> >> 13:07.66 >> >> >>> >> ovn_pinctrl0 >> >> >>> >> 3773 root 20 0 395500 88276 6336 S 0.0 0.0 >> >> 5:19.43 >> >> >>> >> ovn-controller >> >> >>> >> 4449 root 20 0 395500 88276 6336 S 0.0 0.0 >> >> 0:00.00 >> >> >>> >> urcu1 >> >> >>> >> 4452 root 20 0 395500 88276 6336 S 0.0 0.0 >> >> 0:00.07 >> >> >>> >> ovn_statctrl2 >> >> >>> >> 5832 root 20 0 395500 88276 6336 S 0.0 0.0 >> >> 0:04.07 >> >> >>> >> stopwatch3 >> >> >>> >> >> >> >>> >> Interesting thing that ovn-controller as well as pinctrl >> >> doesn't seems >> >> >>> >> starvating for CPU as it is usage is likely about 1%. >> >> >>> >> Host CPU is also feeling a way good >> >> >>> >> %Cpu(s): 29.5 us, 20.0 sy, 0.0 ni, 50.0 id, 0.0 wa, 0.0 >> hi, 0.6 >> >> >>> >> si, 0.0 st >> >> >>> >> >> >> >>> >> Despite that, ovs-vswitchd seems consuming way more CPU >> >> >>> >> >> >> >>> >> PID USER PR NI VIRT RES SHR S %CPU %MEM >> >> TIME+ >> >> >>> >> COMMAND >> >> >>> >> 139416 root 20 0 8378828 452288 8448 R 98.7 0.1 >> >> 990:55.85 >> >> >>> >> ovs-vswitchd >> >> >>> >> 139509 root 20 0 8378828 452288 8448 S 10.7 0.1 >> >> 82:49.98 >> >> >>> >> revalidator90 >> >> >>> >> 139522 root 20 0 8378828 452288 8448 R 10.7 0.1 >> >> 32:01.96 >> >> >>> >> revalidator103 >> >> >>> >> 139519 root 20 0 8378828 452288 8448 R 10.3 0.1 >> >> 54:28.10 >> >> >>> >> revalidator100 >> >> >>> >> 139516 root 20 0 8378828 452288 8448 S 10.0 0.1 >> >> 32:34.74 >> >> >>> >> revalidator97 >> >> >>> >> 139526 root 20 0 8378828 452288 8448 R 10.0 0.1 >> >> 49:16.24 >> >> >>> >> revalidator107 >> >> >>> >> 139531 root 20 0 8378828 452288 8448 R 10.0 0.1 >> >> 40:45.87 >> >> >>> >> revalidator112 >> >> >>> >> 139523 root 20 0 8378828 452288 8448 S 9.7 0.1 >> >> 49:39.00 >> >> >>> >> revalidator104 >> >> >>> >> 139512 root 20 0 8378828 452288 8448 S 7.7 0.1 >> >> 49:37.77 >> >> >>> >> revalidator93 >> >> >>> >> 139497 root 20 0 8378828 452288 8448 S 7.3 0.1 >> >> 67:31.93 >> >> >>> >> handler78 >> >> >>> >> 139513 root 20 0 8378828 452288 8448 S 7.3 0.1 >> >> 36:44.18 >> >> >>> >> revalidator94 >> >> >>> >> 139525 root 20 0 8378828 452288 8448 S 6.3 0.1 >> >> 36:08.10 >> >> >>> >> revalidator106 >> >> >>> >> 139520 root 20 0 8378828 452288 8448 S 5.7 0.1 >> >> 49:48.41 >> >> >>> >> revalidator101 >> >> >>> >> 139521 root 20 0 8378828 452288 8448 R 5.7 0.1 >> >> 36:22.60 >> >> >>> >> revalidator102 >> >> >>> >> 139510 root 20 0 8378828 452288 8448 R 5.0 0.1 >> >> 48:53.38 >> >> >>> >> revalidator91 >> >> >>> >> 139511 root 20 0 8378828 452288 8448 R 5.0 0.1 >> >> 54:09.25 >> >> >>> >> revalidator92 >> >> >>> >> >> >> >>> >> More logs: >> >> >>> >> 15:44:45.190923 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, >> >> >>> >> Request from fa:16:3e:50:22:c4, length 300 >> >> >>> >> 15:44:46.446755 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, >> >> >>> >> Request from fa:16:3e:50:22:c4, length 300 >> >> >>> >> 15:44:47.491209 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, >> >> >>> >> Request from fa:16:3e:50:22:c4, length 300 >> >> >>> >> 15:44:47.869273 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, >> >> >>> >> Request from fa:16:3e:50:22:c4, length 300 >> >> >>> >> 15:44:48.297140 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, >> >> >>> >> Request from fa:16:3e:50:22:c4, length 300 >> >> >>> >> 15:44:51.190709 IP 0.0.0.0.68 > 255.255.255.255.67: BOOTP/DHCP, >> >> >>> >> Request from fa:16:3e:50:22:c4, length 300 >> >> >>> >> >> >> >>> >> Manually tried to use dhclient to obtain IP but no success. >> >> There is >> >> >>> >> logs that ovn-controller sent DHCPOFFER but no packet >> arrived to VM >> >> >>> >> interface. >> >> >>> >>> Switched back to configuration with no neutron-dhcp-agent. >> I have >> >> >>> synced >> >> >>> >>> ovn db with neutron-ovn-db-sync-util. >> >> >>> >>> >> >> >>> >>> There is coverage/show result of node where i'm seeing this >> issue >> >> >>> right >> >> >>> >>> now (current network bandwidth is about 300Mbps). >> >> >>> >>> # ovs-appctl coverage/show >> >> >>> >>> Event coverage, avg rate over last: 5 seconds, last minute, >> last >> >> >>> hour, >> >> >>> >>> hash=982b7b40: >> >> >>> >>> nln_changed 0.0/sec 0.000/sec 0.0197/sec >> >> >>> total: 8568 >> >> >>> >>> route_table_dump 0.0/sec 0.000/sec 0.0072/sec >> >> >>> total: 3953 >> >> >>> >>> netlink_sent 1158.4/sec 1109.700/sec >> 1018.6758/sec >> >> >>> total: >> >> >>> >>> 840427925 >> >> >>> >>> netlink_recv_jumbo 144.2/sec 140.883/sec 212.7489/sec >> >> >>> total: >> >> >>> >>> 127752548 >> >> >>> >>> netlink_received 1258.4/sec 1229.533/sec >> 1192.0311/sec >> >> >>> total: >> >> >>> >>> 1072129295 >> >> >>> >>> netdev_set_ethtool 0.0/sec 0.000/sec 0.0006/sec >> >> >>> total: 361 >> >> >>> >>> netdev_get_ethtool 0.0/sec 0.000/sec 0.0039/sec >> >> >>> total: 2808 >> >> >>> >>> netdev_get_ifindex 0.0/sec 0.000/sec 0.3358/sec >> >> total: >> >> >>> >>> 14218 >> >> >>> >>> netdev_set_policing 0.0/sec 0.000/sec 0.0028/sec >> >> >>> total: 1685 >> >> >>> >>> vconn_sent 596.2/sec 576.967/sec 585.4778/sec >> >> >>> total: >> >> >>> >>> 643013987 >> >> >>> >>> vconn_received 320.0/sec 310.883/sec 307.5108/sec >> >> >>> total: >> >> >>> >>> 360905481 >> >> >>> >>> util_xalloc 937354.8/sec 912615.150/sec >> 873946.4494/ >> >> >>> sec >> >> >>> >>> total: 1068314545738 >> >> >>> >>> unixctl_replied 0.0/sec 0.033/sec 0.0331/sec >> >> total: >> >> >>> >>> 33146 >> >> >>> >>> unixctl_received 0.0/sec 0.033/sec 0.0331/sec >> >> total: >> >> >>> >>> 33146 >> >> >>> >>> long_poll_interval 0.0/sec 0.000/sec 0.0000/sec >> >> >>> total: 1 >> >> >>> >>> stream_open 0.0/sec 0.000/sec 0.0000/sec >> >> >>> total: 2 >> >> >>> >>> pstream_open 0.0/sec 0.000/sec 0.0000/sec >> >> >>> total: 5 >> >> >>> >>> seq_change 3283.6/sec 3255.167/sec >> 3039.7792/sec >> >> >>> total: >> >> >>> >>> 3088392125 >> >> >>> >>> rconn_sent 596.2/sec 576.967/sec 585.4767/sec >> >> >>> total: >> >> >>> >>> 643013979 >> >> >>> >>> rconn_queued 596.2/sec 576.967/sec 585.5044/sec >> >> >>> total: >> >> >>> >>> 643014079 >> >> >>> >>> rconn_overflow 0.0/sec 0.000/sec 2.0272/sec >> >> total: >> >> >>> >>> 16543 >> >> >>> >> >> >>> This means we're tail dropping messages that should be sent to the >> >> >>> controller because we're reaching the queue size limit (100 by >> default >> >> >>> IIUC). >> >> >>> >> >> >>> >>> rconn_discarded 0.0/sec 0.000/sec 0.0278/sec >> >> >>> total: 100 >> >> >>> >> >> >>> This indicates that a controller connection (ovn-controller has >> a few) >> >> >>> has disconnected while there weere messages still queued on it >> to be >> >> >>> sent to the controller. >> >> >>> >> >> >>> >>> poll_zero_timeout 275.2/sec 247.600/sec 161.9242/sec >> >> >>> total: >> >> >>> >>> 136657138 >> >> >>> >>> poll_create_node 4749.6/sec 4417.217/sec >> 4062.3500/sec >> >> >>> total: >> >> >>> >>> 4007153255 >> >> >>> >>> txn_success 0.2/sec 0.183/sec 0.1828/sec >> >> total: >> >> >>> >>> 184805 >> >> >>> >>> txn_incomplete 0.4/sec 0.367/sec 0.4856/sec >> >> total: >> >> >>> >>> 372203 >> >> >>> >>> txn_unchanged 0.0/sec 0.000/sec 0.1169/sec >> >> >>> total: 7146 >> >> >>> >>> netdev_get_stats 77.6/sec 71.133/sec 70.5939/sec >> >> total: >> >> >>> >>> 73986502 >> >> >>> >>> mac_learning_moved 0.0/sec 0.000/sec 0.0000/sec >> >> >>> total: 425 >> >> >>> >>> mac_learning_expired 0.6/sec 0.217/sec 0.4439/sec >> >> total: >> >> >>> >>> 409542 >> >> >>> >>> mac_learning_learned 0.2/sec 0.233/sec 0.4403/sec >> >> total: >> >> >>> >>> 410444 >> >> >>> >>> hmap_expand 298.0/sec 265.833/sec 329.0514/sec >> >> >>> total: >> >> >>> >>> 291663955 >> >> >>> >>> hmap_pathological 10.4/sec 6.050/sec 9.1406/sec >> >> total: >> >> >>> >>> 8189560 >> >> >>> >>> hindex_expand 0.0/sec 0.000/sec 0.0000/sec >> >> >>> total: 12 >> >> >>> >>> hindex_pathological 0.0/sec 0.000/sec 0.0000/sec >> >> >>> total: 148 >> >> >>> >>> miniflow_malloc 0.0/sec 0.000/sec 21.7272/sec >> >> total: >> >> >>> >>> 3395676 >> >> >>> >>> flow_extract 843.4/sec 801.333/sec 727.9828/sec >> >> >>> total: >> >> >>> >>> 755385765 >> >> >>> >>> dpif_port_del 0.0/sec 0.000/sec 0.0006/sec >> >> >>> total: 379 >> >> >>> >>> dpif_port_add 0.0/sec 0.000/sec 0.0006/sec >> >> >>> total: 367 >> >> >>> >>> dpif_flow_put_error 6.0/sec 7.633/sec 5.1642/sec >> >> total: >> >> >>> >>> 4420389 >> >> >>> >>> dpif_flow_put 232.2/sec 208.867/sec 126.8286/sec >> >> >>> total: >> >> >>> >>> 98257942 >> >> >>> >>> dpif_flow_get 0.0/sec 0.000/sec 0.0000/sec >> >> >>> total: 386 >> >> >>> >>> dpif_flow_flush 0.0/sec 0.000/sec 0.0000/sec >> >> >>> total: 1 >> >> >>> >>> dpif_flow_del 182.2/sec 203.100/sec 118.5047/sec >> >> >>> total: >> >> >>> >>> 93717525 >> >> >>> >>> dpif_execute_with_help 0.0/sec 0.000/sec 0.0017/sec >> >> >>> total: 1649 >> >> >>> >>> dpif_execute_error 0.0/sec 0.000/sec 0.0006/sec >> >> >>> total: 398 >> >> >>> >>> dpif_execute 571.8/sec 537.350/sec 439.8069/sec >> >> >>> total: >> >> >>> >>> 472799472 >> >> >>> >>> cmap_shrink 0.8/sec 0.733/sec 1.4025/sec >> >> total: >> >> >>> >>> 1180907 >> >> >>> >>> cmap_expand 1.0/sec 0.583/sec 1.5000/sec >> >> total: >> >> >>> >>> 1182882 >> >> >>> >>> ccmap_shrink 0.0/sec 0.000/sec 0.1786/sec >> >> >>> total: 747 >> >> >>> >>> ccmap_expand 0.0/sec 0.000/sec 0.1200/sec >> >> >>> total: 774 >> >> >>> >>> connmgr_async_unsent 0.0/sec 0.000/sec 0.0372/sec >> >> >>> total: 139 >> >> >>> >>> xlate_actions 4171.0/sec 3071.683/sec >> 3133.7436/sec >> >> >>> total: >> >> >>> >>> 3107954003 >> >> >>> >>> upcall_ukey_replace 0.0/sec 0.117/sec 0.1739/sec >> >> total: >> >> >>> >>> 155715 >> >> >>> >>> upcall_ukey_contention 0.0/sec 0.000/sec 0.0003/sec >> >> >>> total: 157 >> >> >>> >>> upcall_flow_limit_grew 0.0/sec 0.550/sec 0.2328/sec >> >> total: >> >> >>> >>> 56359 >> >> >>> >>> ukey_replace_contention 0.0/sec 0.017/sec 0.0008/sec >> >> >>> total: 847 >> >> >>> >>> revalidate_missed_dp_flow 0.0/sec 0.117/sec 0.0472/sec >> >> total: >> >> >>> >>> 106876 >> >> >>> >>> handler_duplicate_upcall 14.0/sec 12.817/sec 11.5303/sec >> >> total: >> >> >>> >>> 9317449 >> >> >>> >>> dumped_new_flow 0.0/sec 0.000/sec 0.0000/sec >> >> >>> total: 842 >> >> >>> >>> dumped_duplicate_flow 0.0/sec 0.000/sec 0.0003/sec >> >> total: >> >> >>> >>> 37029 >> >> >>> >>> rev_mac_learning 0.8/sec 0.417/sec 0.6419/sec >> >> total: >> >> >>> >>> 708403 >> >> >>> >>> rev_flow_table 0.0/sec 0.000/sec 0.0033/sec >> >> >>> total: 4003 >> >> >>> >>> rev_port_toggled 0.0/sec 0.000/sec 0.0000/sec >> >> >>> total: 119 >> >> >>> >>> rev_reconfigure 0.0/sec 0.000/sec 0.0008/sec >> >> >>> total: 691 >> >> >>> >>> ofproto_update_port 0.0/sec 0.000/sec 0.0039/sec >> >> >>> total: 2430 >> >> >>> >>> ofproto_recv_openflow 320.0/sec 310.883/sec 307.5097/sec >> >> >>> total: >> >> >>> >>> 360905473 >> >> >>> >>> ofproto_packet_out 319.6/sec 310.633/sec 299.3358/sec >> >> >>> total: >> >> >>> >>> 360555564 >> >> >>> >>> ofproto_flush 0.0/sec 0.000/sec 0.0000/sec >> >> >>> total: 1 >> >> >>> >>> bridge_reconfigure 0.0/sec 0.000/sec 0.1125/sec >> >> >>> total: 5108 >> >> >>> >>> 100 events never hit >> >> >>> >>> >> >> >>> >>> > Do you see any warning/error logs in ovs-vswitchd.log? >> >> >>> >>> I see no noticable errors except usual warn logs like invalid >> >> packet. >> >> >>> >>> >> >> >>> >>> > Just double checking, you're saying that with low numbers of >> >> >>> instances >> >> >>> >>> > you don't have this issue? >> >> >>> >>> Exactly. With relatively low instances count there is no >> visible >> >> >>> issue. >> >> >>> >>> >> >> >>> >>> > I wonder if you're not potentially hitting the maximum >> resubmit >> >> >>> limit >> >> >>> >>> > (4096 resubmits) due to DHCPOFFERS being sent as broadcast. >> >> There >> >> >>> >>> > should be warning logs indicating that in ovs-vswitchd. >> >> >>> >>> >> >> >>> >>> There is no error or warn logs mentioning resubmit, but a lot >> >> of such >> >> >>> >>> logs: >> >> >>> >>> >> >> >>> >>> 2025-03-01T11:27:16.473Z|19884389|vconn|DBG|unix#33063: >> received: >> >> >>> >>> OFPT_PACKET_OUT (OF1.5) (xid=0xe3465): in_port=CONTROLLER >> >> >>> >>> actions=set_field:0x300->reg0,set_field:0x9- >> >reg11,set_field:0xa- >> >> >>> >>> >reg12,set_field:0xe0->reg13,set_field:0x1- >> >reg14,set_field:0x291- >> >> >>> >>> >reg15,set_field:0x2->metadata,set_field:0- >> >> >>> >>> >pkt_mark,set_field:0x9002->reg6,set_field:0x8000- >> >> >>> >>> >reg15,resubmit(CONTROLLER,43) >> >> >>> >>> data_len=60 >> >> >>> >>> 2025-03-01T11:27:16.972Z|19884399|vconn|DBG|unix#33063: >> received: >> >> >>> >>> OFPT_PACKET_OUT (OF1.5) (xid=0xe358a): in_port=CONTROLLER >> >> >>> >>> actions=set_field:0x300->reg0,set_field:0x9- >> >reg11,set_field:0xa- >> >> >>> >>> >reg12,set_field:0x93->reg13,set_field:0x1- >> >reg14,set_field:0xdd- >> >> >>> >>> >reg15,set_field:0x2->metadata,set_field:0- >> >> >>> >>> >pkt_mark,set_field:0x9001->reg6,set_field:0x8000- >> >> >>> >>> >reg15,resubmit(CONTROLLER,43) >> >> >>> >>> data_len=60 >> >> >>> >>> 2025-03-01T11:27:17.873Z|19884417|vconn|DBG|unix#33063: >> received: >> >> >>> >>> OFPT_PACKET_OUT (OF1.5) (xid=0xe3765): in_port=CONTROLLER >> >> >>> >>> actions=set_field:0x300->reg0,set_field:0x9- >> >reg11,set_field:0xa- >> >> >>> >>> >reg12,set_field:0x93->reg13,set_field:0x1- >> >reg14,set_field:0xdd- >> >> >>> >>> >reg15,set_field:0x2->metadata,set_field:0- >> >> >>> >>> >pkt_mark,set_field:0x9001->reg6,set_field:0x8000- >> >> >>> >>> >reg15,resubmit(CONTROLLER,43) >> >> >>> >>> data_len=60 >> >> >>> >>> 2025-03-01T11:27:17.972Z|19884419|vconn|DBG|unix#33063: >> received: >> >> >>> >>> OFPT_PACKET_OUT (OF1.5) (xid=0xe37ba): in_port=CONTROLLER >> >> >>> >>> actions=set_field:0x300->reg0,set_field:0x9- >> >reg11,set_field:0xa- >> >> >>> >>> >reg12,set_field:0xe0->reg13,set_field:0x1- >> >reg14,set_field:0x291- >> >> >>> >>> >reg15,set_field:0x2->metadata,set_field:0- >> >> >>> >>> >pkt_mark,set_field:0x9002->reg6,set_field:0x8000- >> >> >>> >>> >reg15,resubmit(CONTROLLER,43) >> >> >>> >>> data_len=60 >> >> >>> >> >> >>> These are due to the fact that OVN's multicast groups have been >> split >> >> >>> since commit >> >> >>> https://github.com/ovn-org/ovn/ >> >> >>> commit/325c7b203d8bfd12bc1285ad11390c1a55cd6717 >> >> >>> >> >> >>> A similar problem has been reported/discussed in: >> >> >>> https://mail.openvswitch.org/pipermail/ovs-discuss/2025- >> >> >>> February/053455.html >> >> >>> >> >> >>> To confirm that it's ARP requests (broadcast) that's causing >> >> problems in >> >> >>> your case, we'd need some more logs from you: >> >> >>> >> >> >>> ovn-appctl vlog/set pinctrl:dbg >> >> >>> >> >> >>> We're interested in logs that look like: >> >> >>> 2025-02-12T20:35:41.490Z|10791|pinctrl(ovn_pinctrl0)|DBG|pinctrl >> >> >>> received packet-in | opcode=... >> >> >>> >> >> >>> I'm guessing your topology might be similar (a "large" public >> logical >> >> >>> switch with quite a few connected logical routers). If so, a >> potential >> >> >>> solution discussed there was to set >> >> >>> "other_config:broadcast-arps-to-all-routers=false" on the public >> >> logical >> >> >>> switch. Would you be able to check if changing that setting >> helps in >> >> >>> your case too please? >> >> >>> >> >> >>> Thanks, >> >> >>> Dumitru >> >> >>> >> >> >>> >>> >> >> >>> >>> I've checked presence of error/warn logs at ovs-vswitchd using >> >> next >> >> >>> >>> command >> >> >>> >>> # cat /var/log/kolla/openvswitch/ovs-vswitchd.log | grep -i >> >> >>> "resubmit" | >> >> >>> >>> grep -iE "warn|err >> >> >>> >>> >> >> >>> >>> Also i have non-deployed region running 2024.2 OpenStack and >> >> did some >> >> >>> >>> dumb tests like creating 1500 VMs with about 500 instances per >> >> >>> >>> hypervisor and didn't see this issue. However, i didn't use L2 >> >> >>> network >> >> >>> >>> in this case, only virtual network like geneve/vxlan. I >> will try >> >> >>> today >> >> >>> >>> to create sample L2 vlan with mac learning enabled at the >> border >> >> >>> router >> >> >>> >>> and report back. I think this bug will not occur if there >> is no a >> >> >>> lot of >> >> >>> >>> packets coming through ovs, handling DHCP with no issues... >> >> >>> >>> >> >> >>> >>> 24.02.2025 16:00, Dumitru Ceara пишет: >> >> >>> >>> > On 2/20/25 11:13 PM, Ilia Baikov via discuss wrote: >> >> >>> >>> >> Hi >> >> >>> >>> >> >> >> >>> >>> > Hi Ilia, >> >> >>> >>> > >> >> >>> >>> > + Ilya Maximets for more ovs insights. >> >> >>> >>> > >> >> >>> >>> >> Over a lot of observes and testing various number of >> possible >> >> >>> >>> fixes came >> >> >>> >>> >> with nothing... >> >> >>> >>> >> >> >> >>> >>> >> Yes, there are messages that contains NXT_RESUME substring. >> >> >>> >>> >> >> >> >>> >>> >> ovn-controller: >> >> >>> >>> >> 2025-02-18T21:05:51.771Z|1041129|vconn(ovn_pinctrl0)| >> DBG|unix:/ >> >> >>> >>> var/run/ >> >> >>> >>> >> openvswitch/br-int.mgmt: sent (Success): NXT_RESUME (OF1.5) >> >> >>> (xid=0x0): >> >> >>> >>> >> cookie=0x404b44bc total_len=348 >> >> >>> >>> >> >> >> >>> >>> >> >> >>> >> >> >> reg0=0x308,reg11=0x9,reg12=0xa,reg13=0x13c,reg14=0x4a8,metadata=0x2,in_port=312 >> (via action) data_len=348 (unbuffered) >> >> >>> >>> >> 2025-02-18T21:08:23.729Z|1044255|vconn(ovn_pinctrl0)| >> DBG|unix:/ >> >> >>> >>> var/run/ >> >> >>> >>> >> openvswitch/br-int.mgmt: sent (Success): NXT_RESUME (OF1.5) >> >> >>> (xid=0x0): >> >> >>> >>> >> cookie=0xffe9d519 total_len=348 >> >> >>> >>> >> >> >> >>> >>> >> >> >>> >> >> >> reg0=0x308,reg11=0x9,reg12=0xa,reg13=0x112,reg14=0x2dd,metadata=0x2,in_port=363 >> (via action) data_len=348 (unbuffered) >> >> >>> >>> >> 2025-02-18T21:10:33.865Z|1046917|vconn(ovn_pinctrl0)| >> DBG|unix:/ >> >> >>> >>> var/run/ >> >> >>> >>> >> openvswitch/br-int.mgmt: sent (Success): NXT_RESUME (OF1.5) >> >> >>> (xid=0x0): >> >> >>> >>> >> cookie=0x5bc96bd6 total_len=348 >> >> >>> >>> >> >> >> >>> >>> >> >> >>> >> >> >> reg0=0x28b,reg11=0x9,reg12=0xa,reg13=0x57,reg14=0x423,metadata=0x2,in_port=83 >> (via action) data_len=348 (unbuffered) >> >> >>> >>> >> 2025-02-18T21:11:06.733Z|1047596|vconn(ovn_pinctrl0)| >> DBG|unix:/ >> >> >>> >>> var/run/ >> >> >>> >>> >> openvswitch/br-int.mgmt: sent (Success): NXT_RESUME (OF1.5) >> >> >>> (xid=0x0): >> >> >>> >>> >> cookie=0x5bc96bd6 total_len=348 >> >> >>> >>> >> >> >> >>> >>> >> >> >>> >> >> >> reg0=0x28b,reg11=0x9,reg12=0xa,reg13=0x5f,reg14=0x413,metadata=0x2,in_port=91 >> (via action) data_len=348 (unbuffered) >> >> >>> >>> >> 2025-02-18T21:12:16.984Z|1048995|vconn(ovn_pinctrl0)| >> DBG|unix:/ >> >> >>> >>> var/run/ >> >> >>> >>> >> openvswitch/br-int.mgmt: sent (Success): NXT_RESUME (OF1.5) >> >> >>> (xid=0x0): >> >> >>> >>> >> cookie=0xfded3de0 total_len=348 >> >> >>> >>> >> >> >> >>> >>> >> >> >>> >> >> >> reg0=0x308,reg11=0x9,reg12=0xa,reg13=0xc5,reg14=0x354,metadata=0x2,in_port=373 >> (via action) data_len=348 (unbuffered) >> >> >>> >>> >> 2025-02-18T21:13:03.396Z|1049934|vconn(ovn_pinctrl0)| >> DBG|unix:/ >> >> >>> >>> var/run/ >> >> >>> >>> >> openvswitch/br-int.mgmt: sent (Success): NXT_RESUME (OF1.5) >> >> >>> (xid=0x0): >> >> >>> >>> >> cookie=0xfded3de0 total_len=348 >> >> >>> >>> >> >> >> >>> >>> >> >> >>> >> >> >> reg0=0x308,reg11=0x9,reg12=0xa,reg13=0xc5,reg14=0x354,metadata=0x2,in_port=373 >> (via action) data_len=348 (unbuffered) >> >> >>> >>> >> 2025-02-18T21:19:48.641Z|1058227|vconn(ovn_pinctrl0)| >> DBG|unix:/ >> >> >>> >>> var/run/ >> >> >>> >>> >> openvswitch/br-int.mgmt: sent (Success): NXT_RESUME (OF1.5) >> >> >>> (xid=0x0): >> >> >>> >>> >> cookie=0xbbfea837 total_len=348 >> >> >>> >>> >> >> >> >>> >>> >> >> >>> >> >> >> reg0=0x308,reg11=0x9,reg12=0xa,reg13=0x132,reg14=0x434,metadata=0x2,in_port=302 >> (via action) data_len=348 (unbuffered) >> >> >>> >>> >> 2025-02-18T21:19:50.069Z|1058255|vconn(ovn_pinctrl0)| >> DBG|unix:/ >> >> >>> >>> var/run/ >> >> >>> >>> >> openvswitch/br-int.mgmt: sent (Success): NXT_RESUME (OF1.5) >> >> >>> (xid=0x0): >> >> >>> >>> >> cookie=0x5bc96bd6 total_len=348 >> >> >>> >>> >> >> >> >>> >>> >> >> >>> >> >> >> reg0=0x28b,reg11=0x9,reg12=0xa,reg13=0x6f,reg14=0x148,metadata=0x2,in_port=107 >> (via action) data_len=348 (unbuffered) >> >> >>> >>> >> 2025-02-18T21:21:38.224Z|1060474|vconn(ovn_pinctrl0)| >> DBG|unix:/ >> >> >>> >>> var/run/ >> >> >>> >>> >> openvswitch/br-int.mgmt: sent (Success): NXT_RESUME (OF1.5) >> >> >>> (xid=0x0): >> >> >>> >>> >> cookie=0x5bc96bd6 total_len=348 >> >> >>> >>> >> >> >> >>> >>> >> >> >>> >> >> >> reg0=0x28b,reg11=0x9,reg12=0xa,reg13=0xdb,reg14=0x336,metadata=0x2,in_port=215 >> (via action) data_len=348 (unbuffered) >> >> >>> >>> >> 2025-02-18T21:21:42.735Z|1060568|vconn(ovn_pinctrl0)| >> DBG|unix:/ >> >> >>> >>> var/run/ >> >> >>> >>> >> openvswitch/br-int.mgmt: sent (Success): NXT_RESUME (OF1.5) >> >> >>> (xid=0x0): >> >> >>> >>> >> cookie=0xffe9d519 total_len=348 >> >> >>> >>> >> >> >> >>> >>> >> >> >>> >> >> >> reg0=0x308,reg11=0x9,reg12=0xa,reg13=0x112,reg14=0x2dd,metadata=0x2,in_port=363 >> (via action) data_len=348 (unbuffered) >> >> >>> >>> >> >> >> >>> >>> >> ovs-vswitchd >> >> >>> >>> >> 2025-02-18T20:55:45.673Z|1086321|vconn|DBG|unix#0: received: >> >> >>> >>> NXT_RESUME >> >> >>> >>> >> (OF1.5) (xid=0x0): cookie=0x58482a66 total_len=348 >> >> >>> >>> >> >> >> >>> >>> >> >> >>> >> >> >> reg0=0x308,reg11=0x9,reg12=0xa,reg13=0x151,reg14=0x4b1,metadata=0x2,in_port=375 >> (via action) data_len=348 (unbuffered) >> >> >>> >>> >> 2025-02-18T21:01:05.872Z|1092877|vconn|DBG|unix#0: received: >> >> >>> >>> NXT_RESUME >> >> >>> >>> >> (OF1.5) (xid=0x0): cookie=0x983e6272 total_len=348 >> >> >>> >>> >> >> >> >>> >>> >> >> >>> >> >> >> reg0=0x308,reg11=0x9,reg12=0xa,reg13=0x15f,reg14=0x4d3,metadata=0x2,in_port=368 >> (via action) data_len=348 (unbuffered) >> >> >>> >>> >> 2025-02-18T21:01:23.672Z|1093239|vconn|DBG|unix#0: received: >> >> >>> >>> NXT_RESUME >> >> >>> >>> >> (OF1.5) (xid=0x0): cookie=0xfded3de0 total_len=348 >> >> >>> >>> >> >> >> >>> >>> >> >> >>> >> >> >> reg0=0x308,reg11=0x9,reg12=0xa,reg13=0xc5,reg14=0x354,metadata=0x2,in_port=373 >> (via action) data_len=348 (unbuffered) >> >> >>> >>> >> 2025-02-18T21:04:03.272Z|1096513|vconn|DBG|unix#0: received: >> >> >>> >>> NXT_RESUME >> >> >>> >>> >> (OF1.5) (xid=0x0): cookie=0xfded3de0 total_len=348 >> >> >>> >>> >> >> >> >>> >>> >> >> >>> >> >> >> reg0=0x308,reg11=0x9,reg12=0xa,reg13=0xc5,reg14=0x354,metadata=0x2,in_port=373 >> (via action) data_len=348 (unbuffered) >> >> >>> >>> >> 2025-02-18T21:07:36.773Z|1100887|vconn|DBG|unix#0: received: >> >> >>> >>> NXT_RESUME >> >> >>> >>> >> (OF1.5) (xid=0x0): cookie=0xfded3de0 total_len=348 >> >> >>> >>> >> >> >> >>> >>> >> >> >>> >> >> >> reg0=0x308,reg11=0x9,reg12=0xa,reg13=0xc5,reg14=0x354,metadata=0x2,in_port=373 >> (via action) data_len=348 (unbuffered) >> >> >>> >>> >> 2025-02-18T21:07:51.472Z|1101187|vconn|DBG|unix#0: received: >> >> >>> >>> NXT_RESUME >> >> >>> >>> >> (OF1.5) (xid=0x0): cookie=0xa6d16298 total_len=348 >> >> >>> >>> >> >> >> >>> >>> >> >> >>> >> >> >> reg0=0x308,reg11=0x9,reg12=0xa,reg13=0x4b,reg14=0x1bb,metadata=0x2,in_port=349 >> (via action) data_len=348 (unbuffered) >> >> >>> >>> >> 2025-02-18T21:08:08.673Z|1101537|vconn|DBG|unix#0: received: >> >> >>> >>> NXT_RESUME >> >> >>> >>> >> (OF1.5) (xid=0x0): cookie=0x8b831d17 total_len=348 >> >> >>> >>> >> >> >> >>> >>> >> >> >>> >> >> >> reg0=0x308,reg11=0x9,reg12=0xa,reg13=0x26,reg14=0x123,metadata=0x2,in_port=34 >> (via action) data_len=348 (unbuffered) >> >> >>> >>> >> 2025-02-18T21:09:09.173Z|1102776|vconn|DBG|unix#0: received: >> >> >>> >>> NXT_RESUME >> >> >>> >>> >> (OF1.5) (xid=0x0): cookie=0xbf424a23 total_len=348 >> >> >>> >>> >> >> >> >>> >>> >> >> >>> >> >> >> reg0=0x308,reg11=0x9,reg12=0xa,reg13=0x85,reg14=0x2d9,metadata=0x2,in_port=371 >> (via action) data_len=348 (unbuffered) >> >> >>> >>> >> 2025-02-18T21:13:03.574Z|1108046|vconn|DBG|unix#0: received: >> >> >>> >>> NXT_RESUME >> >> >>> >>> >> (OF1.5) (xid=0x0): cookie=0xfded3de0 total_len=348 >> >> >>> >>> >> >> >> >>> >>> >> >> >>> >> >> >> reg0=0x308,reg11=0x9,reg12=0xa,reg13=0xc5,reg14=0x354,metadata=0x2,in_port=373 >> (via action) data_len=348 (unbuffered) >> >> >>> >>> >> 2025-02-18T21:17:18.772Z|1113273|vconn|DBG|unix#0: received: >> >> >>> >>> NXT_RESUME >> >> >>> >>> >> (OF1.5) (xid=0x0): cookie=0x5bc96bd6 total_len=348 >> >> >>> >>> >> >> >> >>> >>> >> >> >>> >> >> >> reg0=0x28b,reg11=0x9,reg12=0xa,reg13=0x7c,reg14=0x241,metadata=0x2,in_port=120 >> (via action) data_len=348 (unbuffered) >> >> >>> >>> >> 2025-02-18T21:18:51.673Z|1115178|vconn|DBG|unix#0: received: >> >> >>> >>> NXT_RESUME >> >> >>> >>> >> (OF1.5) (xid=0x0): cookie=0xfded3de0 total_len=348 >> >> >>> >>> >> >> >> >>> >>> >> >> >>> >> >> >> reg0=0x308,reg11=0x9,reg12=0xa,reg13=0xc5,reg14=0x354,metadata=0x2,in_port=373 >> (via action) data_len=348 (unbuffered) >> >> >>> >>> >> 2025-02-18T21:19:36.872Z|1116105|vconn|DBG|unix#0: received: >> >> >>> >>> NXT_RESUME >> >> >>> >>> >> (OF1.5) (xid=0x0): cookie=0x5bc96bd6 total_len=348 >> >> >>> >>> >> >> >> >>> >>> >> >> >>> >> >> >> reg0=0x28b,reg11=0x9,reg12=0xa,reg13=0x87,reg14=0x4b,metadata=0x2,in_port=131 >> (via action) data_len=348 (unbuffered) >> >> >>> >>> >> 2025-02-18T21:19:52.172Z|1116417|vconn|DBG|unix#0: received: >> >> >>> >>> NXT_RESUME >> >> >>> >>> >> (OF1.5) (xid=0x0): cookie=0xfded3de0 total_len=348 >> >> >>> >>> >> >> >> >>> >>> >> >> >>> >> >> >> reg0=0x308,reg11=0x9,reg12=0xa,reg13=0xc5,reg14=0x354,metadata=0x2,in_port=373 >> (via action) data_len=348 (unbuffered) >> >> >>> >>> >> 2025-02-18T21:20:24.572Z|1117078|vconn|DBG|unix#0: received: >> >> >>> >>> NXT_RESUME >> >> >>> >>> >> (OF1.5) (xid=0x0): cookie=0x5bc96bd6 total_len=348 >> >> >>> >>> >> >> >> >>> >>> >> >> >>> >> >> >> reg0=0x28b,reg11=0x9,reg12=0xa,reg13=0x8b,reg14=0x2fb,metadata=0x2,in_port=135 >> (via action) data_len=348 (unbuffered) >> >> >>> >>> >> 2025-02-18T21:21:31.672Z|1118459|vconn|DBG|unix#0: received: >> >> >>> >>> NXT_RESUME >> >> >>> >>> >> (OF1.5) (xid=0x0): cookie=0x5bc96bd6 total_len=348 >> >> >>> >>> >> >> >> >>> >>> >> >> >>> >> >> >> reg0=0x28b,reg11=0x9,reg12=0xa,reg13=0x93,reg14=0xdd,metadata=0x2,in_port=143 >> (via action) data_len=348 (unbuffered) >> >> >>> >>> >> >> >> >>> >>> >> The interesting logs is also at ovn-controller logs >> >> >>> >>> >> 2025-02-18T21:21:10.152Z|1059900|pinctrl(ovn_pinctrl0)|INFO| >> >> >>> DHCPOFFER >> >> >>> >>> >> fa:16:3e:a6:57:54 13.37.13.68 >> >> >>> >>> >> 2025-02-18T21:21:10.560Z|1059909|pinctrl(ovn_pinctrl0)|INFO| >> >> >>> DHCPOFFER >> >> >>> >>> >> fa:16:3e:43:d4:9a 13.38.13.15 >> >> >>> >>> >> 2025-02-18T21:21:14.178Z|1059983|pinctrl(ovn_pinctrl0)|INFO| >> >> >>> DHCPOFFER >> >> >>> >>> >> fa:16:3e:cb:77:8d 13.37.13.72 >> >> >>> >>> >> 2025-02-18T21:21:16.785Z|1060037|pinctrl(ovn_pinctrl0)|INFO| >> >> >>> DHCPOFFER >> >> >>> >>> >> fa:16:3e:cc:d3:66 13.38.13.62 >> >> >>> >>> >> 2025-02-18T21:21:19.661Z|1060097|pinctrl(ovn_pinctrl0)|INFO| >> >> >>> DHCPOFFER >> >> >>> >>> >> fa:16:3e:7e:1b:a8 13.37.13.173 >> >> >>> >>> >> 2025-02-18T21:21:22.582Z|1060157|pinctrl(ovn_pinctrl0)|INFO| >> >> >>> DHCPOFFER >> >> >>> >>> >> fa:16:3e:cc:d3:66 13.38.13.62 >> >> >>> >>> >> 2025-02-18T21:21:25.903Z|1060225|pinctrl(ovn_pinctrl0)|INFO| >> >> >>> DHCPOFFER >> >> >>> >>> >> fa:16:3e:f6:ae:8b 13.37.13.11 >> >> >>> >>> >> 2025-02-18T21:21:28.643Z|1060281|pinctrl(ovn_pinctrl0)|INFO| >> >> >>> DHCPOFFER >> >> >>> >>> >> fa:16:3e:b0:98:e0 13.37.13.101 >> >> >>> >>> >> 2025-02-18T21:21:32.687Z|1060361|pinctrl(ovn_pinctrl0)|INFO| >> >> >>> DHCPOFFER >> >> >>> >>> >> fa:16:3e:82:6e:c3 13.37.13.183 >> >> >>> >>> >> 2025-02-18T21:21:34.517Z|1060398|pinctrl(ovn_pinctrl0)|INFO| >> >> >>> DHCPOFFER >> >> >>> >>> >> fa:16:3e:cb:08:c7 13.37.13.55 >> >> >>> >>> >> 2025-02-18T21:21:38.673Z|1060484|pinctrl(ovn_pinctrl0)|INFO| >> >> >>> DHCPOFFER >> >> >>> >>> >> fa:16:3e:f9:5b:25 13.37.13.98 >> >> >>> >>> >> 2025-02-18T21:21:40.782Z|1060528|pinctrl(ovn_pinctrl0)|INFO| >> >> >>> DHCPOFFER >> >> >>> >>> >> fa:16:3e:cc:d3:66 13.38.13.62 >> >> >>> >>> >> 2025-02-18T21:21:44.097Z|1060596|pinctrl(ovn_pinctrl0)|INFO| >> >> >>> DHCPOFFER >> >> >>> >>> >> fa:16:3e:95:0c:5e 13.37.13.197 >> >> >>> >>> >> 2025-02-18T21:21:46.484Z|1060646|pinctrl(ovn_pinctrl0)|INFO| >> >> >>> DHCPOFFER >> >> >>> >>> >> fa:16:3e:cc:d3:66 13.38.13.62 >> >> >>> >>> >> 2025-02-18T21:21:50.203Z|1060722|pinctrl(ovn_pinctrl0)|INFO| >> >> >>> DHCPOFFER >> >> >>> >>> >> fa:16:3e:92:f8:ba 13.37.13.124 >> >> >>> >>> >> 2025-02-18T21:21:52.764Z|1060776|pinctrl(ovn_pinctrl0)|INFO| >> >> >>> DHCPOFFER >> >> >>> >>> >> fa:16:3e:ce:b8:1e 13.37.13.126 >> >> >>> >>> >> 2025-02-18T21:21:55.503Z|1060832|pinctrl(ovn_pinctrl0)|INFO| >> >> >>> DHCPOFFER >> >> >>> >>> >> fa:16:3e:8d:88:f6 13.38.13.63 >> >> >>> >>> >> 2025-02-18T21:21:59.520Z|1060914|pinctrl(ovn_pinctrl0)|INFO| >> >> >>> DHCPOFFER >> >> >>> >>> >> fa:16:3e:8f:20:53 13.37.13.63 >> >> >>> >>> >> 2025-02-18T21:22:01.620Z|1060958|pinctrl(ovn_pinctrl0)|INFO| >> >> >>> DHCPOFFER >> >> >>> >>> >> fa:16:3e:8a:7a:47 13.37.13.150 >> >> >>> >>> >> 2025-02-18T21:22:04.418Z|1061015|pinctrl(ovn_pinctrl0)|INFO| >> >> >>> DHCPOFFER >> >> >>> >>> >> fa:16:3e:70:7e:5f 13.37.13.216 >> >> >>> >>> >> 2025-02-18T21:22:07.477Z|1061079|pinctrl(ovn_pinctrl0)|INFO| >> >> >>> DHCPOFFER >> >> >>> >>> >> fa:16:3e:2a:7a:ad 13.37.13.50 >> >> >>> >>> >> 2025-02-18T21:22:11.719Z|1061165|pinctrl(ovn_pinctrl0)|INFO| >> >> >>> DHCPOFFER >> >> >>> >>> >> fa:16:3e:cc:d3:66 13.38.13.62 >> >> >>> >>> >> 2025-02-18T21:22:13.798Z|1061208|pinctrl(ovn_pinctrl0)|INFO| >> >> >>> DHCPOFFER >> >> >>> >>> >> fa:16:3e:a6:57:54 13.37.13.68 >> >> >>> >>> >> 2025-02-18T21:22:16.612Z|1061266|pinctrl(ovn_pinctrl0)|INFO| >> >> >>> DHCPOFFER >> >> >>> >>> >> fa:16:3e:81:5f:99 13.37.13.14 >> >> >>> >>> >> 2025-02-18T21:22:19.830Z|1061334|pinctrl(ovn_pinctrl0)|INFO| >> >> >>> DHCPOFFER >> >> >>> >>> >> fa:16:3e:cc:d3:66 13.38.13.62 >> >> >>> >>> >> 2025-02-18T21:22:22.460Z|1061388|pinctrl(ovn_pinctrl0)|INFO| >> >> >>> DHCPOFFER >> >> >>> >>> >> fa:16:3e:cc:d3:66 13.38.13.62 >> >> >>> >>> >> 2025-02-18T21:22:26.001Z|1061460|pinctrl(ovn_pinctrl0)|INFO| >> >> >>> DHCPOFFER >> >> >>> >>> >> fa:16:3e:cc:d3:66 13.38.13.62 >> >> >>> >>> >> 2025-02-18T21:22:28.383Z|1061506|pinctrl(ovn_pinctrl0)|INFO| >> >> >>> DHCPOFFER >> >> >>> >>> >> fa:16:3e:cc:d3:66 13.38.13.62 >> >> >>> >>> >> 2025-02-18T21:22:31.594Z|1061572|pinctrl(ovn_pinctrl0)|INFO| >> >> >>> DHCPOFFER >> >> >>> >>> >> fa:16:3e:58:35:35 13.37.13.64 >> >> >>> >>> >> 2025-02-18T21:22:35.249Z|1061648|pinctrl(ovn_pinctrl0)|INFO| >> >> >>> DHCPOFFER >> >> >>> >>> >> fa:16:3e:cc:d3:66 13.38.13.62 >> >> >>> >>> >> 2025-02-18T21:22:37.578Z|1061695|pinctrl(ovn_pinctrl0)|INFO| >> >> >>> DHCPOFFER >> >> >>> >>> >> fa:16:3e:82:6e:c3 13.37.13.183 >> >> >>> >>> >> 2025-02-18T21:22:40.514Z|1061755|pinctrl(ovn_pinctrl0)|INFO| >> >> >>> DHCPOFFER >> >> >>> >>> >> fa:16:3e:cc:d3:66 13.38.13.62 >> >> >>> >>> >> 2025-02-18T21:22:43.373Z|1061815|pinctrl(ovn_pinctrl0)|INFO| >> >> >>> DHCPOFFER >> >> >>> >>> >> fa:16:3e:58:72:c4 13.37.13.65 >> >> >>> >>> >> 2025-02-18T21:22:46.557Z|1061881|pinctrl(ovn_pinctrl0)|INFO| >> >> >>> DHCPOFFER >> >> >>> >>> >> fa:16:3e:02:5e:ac 13.37.13.111 >> >> >>> >>> >> 2025-02-18T21:22:50.020Z|1061951|pinctrl(ovn_pinctrl0)|INFO| >> >> >>> DHCPOFFER >> >> >>> >>> >> fa:16:3e:cc:d3:66 13.38.13.62 >> >> >>> >>> >> 2025-02-18T21:22:53.511Z|1062023|pinctrl(ovn_pinctrl0)|INFO| >> >> >>> DHCPOFFER >> >> >>> >>> >> fa:16:3e:5f:fd:6f 13.37.13.15 >> >> >>> >>> >> 2025-02-18T21:22:55.717Z|1062069|pinctrl(ovn_pinctrl0)|INFO| >> >> >>> DHCPOFFER >> >> >>> >>> >> fa:16:3e:cc:d3:66 13.38.13.62 >> >> >>> >>> >> 2025-02-18T21:22:58.812Z|1062133|pinctrl(ovn_pinctrl0)|INFO| >> >> >>> DHCPOFFER >> >> >>> >>> >> fa:16:3e:8d:88:f6 13.38.13.63 >> >> >>> >>> >> 2025-02-18T21:23:01.450Z|1062189|pinctrl(ovn_pinctrl0)|INFO| >> >> >>> DHCPOFFER >> >> >>> >>> >> fa:16:3e:9a:da:47 13.37.13.225 >> >> >>> >>> >> 2025-02-18T21:23:05.545Z|1062273|pinctrl(ovn_pinctrl0)|INFO| >> >> >>> DHCPOFFER >> >> >>> >>> >> fa:16:3e:8a:7a:47 13.37.13.150 >> >> >>> >>> >> 2025-02-18T21:23:08.512Z|1062332|pinctrl(ovn_pinctrl0)|INFO| >> >> >>> DHCPOFFER >> >> >>> >>> >> fa:16:3e:70:7e:5f 13.37.13.216 >> >> >>> >>> >> 2025-02-18T21:23:10.980Z|1062384|pinctrl(ovn_pinctrl0)|INFO| >> >> >>> DHCPOFFER >> >> >>> >>> >> fa:16:3e:02:ff:8a 13.37.13.233 >> >> >>> >>> >> 2025-02-18T21:23:14.405Z|1062454|pinctrl(ovn_pinctrl0)|INFO| >> >> >>> DHCPOFFER >> >> >>> >>> >> fa:16:3e:cc:d3:66 13.38.13.62 >> >> >>> >>> >> 2025-02-18T21:23:17.172Z|1062511|pinctrl(ovn_pinctrl0)|INFO| >> >> >>> DHCPOFFER >> >> >>> >>> >> fa:16:3e:2e:73:8f 13.37.13.186 >> >> >>> >>> >> 2025-02-18T21:23:19.435Z|1062559|pinctrl(ovn_pinctrl0)|INFO| >> >> >>> DHCPOFFER >> >> >>> >>> >> fa:16:3e:39:8d:2f 13.37.13.153 >> >> >>> >>> >> 2025-02-18T21:23:23.049Z|1062633|pinctrl(ovn_pinctrl0)|INFO| >> >> >>> DHCPOFFER >> >> >>> >>> >> fa:16:3e:cc:d3:66 13.38.13.62 >> >> >>> >>> >> 2025-02-18T21:23:25.653Z|1062687|pinctrl(ovn_pinctrl0)|INFO| >> >> >>> DHCPOFFER >> >> >>> >>> >> fa:16:3e:d4:36:c6 13.37.13.60 >> >> >>> >>> >> 2025-02-18T21:23:28.768Z|1062749|pinctrl(ovn_pinctrl0)|INFO| >> >> >>> DHCPOFFER >> >> >>> >>> >> fa:16:3e:f1:64:3a 13.37.13.125 >> >> >>> >>> >> 2025-02-18T21:23:33.340Z|1062843|pinctrl(ovn_pinctrl0)|INFO| >> >> >>> DHCPOFFER >> >> >>> >>> >> fa:16:3e:56:c4:04 13.37.13.219 >> >> >>> >>> >> 2025-02-18T21:23:34.400Z|1062865|pinctrl(ovn_pinctrl0)|INFO| >> >> >>> DHCPOFFER >> >> >>> >>> >> fa:16:3e:6a:7b:a4 13.37.13.121 >> >> >>> >>> >> 2025-02-18T21:23:39.485Z|1062968|pinctrl(ovn_pinctrl0)|INFO| >> >> >>> DHCPOFFER >> >> >>> >>> >> fa:16:3e:cc:d3:66 13.38.13.62 >> >> >>> >>> >> 2025-02-18T21:23:41.393Z|1063007|pinctrl(ovn_pinctrl0)|INFO| >> >> >>> DHCPOFFER >> >> >>> >>> >> fa:16:3e:82:6e:c3 13.37.13.183 >> >> >>> >>> >> 2025-02-18T21:23:44.410Z|1063069|pinctrl(ovn_pinctrl0)|INFO| >> >> >>> DHCPOFFER >> >> >>> >>> >> fa:16:3e:52:ef:94 13.37.13.182 >> >> >>> >>> >> >> >> >>> >>> >> IPs are sanitized, anyway 13.37.13 belongs to subnet #1 and >> >> >>> >>> 13.38.13 to >> >> >>> >>> >> subnet #2. Observing large regression across all infra, like >> >> >>> >>> almost 50% >> >> >>> >>> >> overall network bandwith loss over time. >> >> >>> >>> >> >> >> >>> >>> >> Any thoughts guys? >> >> >>> >>> >> >> >> >>> >>> > Do you see any warning/error logs in ovs-vswitchd.log? >> >> >>> >>> > >> >> >>> >>> >> Context: we run a lot of small instances with 1 x fixed IP >> >> >>> with little >> >> >>> >>> >> amount of external traffic, about 300 per hypervisor, >> >> >>> sometimes more, >> >> >>> >>> >> sometimes less anyway didn't see anything similar with plain >> >> >>> ovs and >> >> >>> >>> >> neutron-openvswitch-agent (openstack). >> >> >>> >>> >> UPD: Once i manually cleared dhcpv4_options column in all >> >> >>> ports seems >> >> >>> >>> >> like neutron-dhcp-agent (dnsmasq based) started to send >> >> >>> dhcpoffers and >> >> >>> >>> >> most of instances back to normal. However, didn't manage >> what >> >> >>> is the >> >> >>> >>> >> reason to ovn-controller not send DHCPOFFER to VIF. >> Seems like >> >> >>> a big >> >> >>> >>> >> point of regression starting from N number of instances or >> >> >>> >>> something else. >> >> >>> >>> >> >> >> >>> >>> > Just double checking, you're saying that with low numbers of >> >> >>> instances >> >> >>> >>> > you don't have this issue? >> >> >>> >>> > >> >> >>> >>> > I wonder if you're not potentially hitting the maximum >> resubmit >> >> >>> limit >> >> >>> >>> > (4096 resubmits) due to DHCPOFFERS being sent as broadcast. >> >> There >> >> >>> >>> > should be warning logs indicating that in ovs-vswitchd. >> >> >>> >>> > >> >> >>> >>> > It might also help if you could share the output of: >> >> >>> >>> > >> >> >>> >>> > ovs-appctl coverage/show >> >> >>> >>> > >> >> >>> >>> >>> 10.02.2025 17:36, Numan Siddique: >> >> >>> >>> >>>> >> >> >>> >>> >>>> On Sat, Feb 8, 2025 at 1:27 PM support via discuss <ovs- >> >> >>> >>> >>>> disc...@openvswitch.org> wrote: >> >> >>> >>> >>>> >> >> >>> >>> >>>> I'm observing for a long time next bug: ovn-controller >> >> >>> doesn't >> >> >>> >>> >>>> send DHCPOFFER but it logs that it does. However >> this VIF >> >> >>> >>> doesn't >> >> >>> >>> >>>> get it at all. >> >> >>> >>> >>>> But once ovn-controller restarted it successfully >> seems >> >> >>> like it >> >> >>> >>> >>>> actually send DHCPOFFER and immediately receives >> DHCPACK >> >> >>> >>> from VIF. >> >> >>> >>> >>>> >> >> >>> >>> >>>> >> >> >>> >>> >>>> When you see this issue, can you enable vconn and >> rcon debug >> >> >>> >>> logs in >> >> >>> >>> >>>> both ovn-controller and ovs-vswitchd ? >> >> >>> >>> >>>> >> >> >>> >>> >>>> ovn-appctl vlog/set rconn:dbg >> >> >>> >>> >>>> ovn-appctl vlog/set vconn:dbg >> >> >>> >>> >>>> >> >> >>> >>> >>>> ovs-appctl vlog/set rconn:dbg >> >> >>> >>> >>>> ovs-appctl vlog/set rconn:dbg >> >> >>> >>> >>>> >> >> >>> >>> >>>> If ovn-controller replies with a DHCPOFFER, you should >> >> see the >> >> >>> >>> >>>> NXT_RESUME related logs in both ovn-controller and ovs- >> >> vswitchd. >> >> >>> >>> >>>> >> >> >>> >>> >>>> This can help you narrow down the issue. >> >> >>> >>> >>>> >> >> >>> >>> >>>> Thanks >> >> >>> >>> >>>> Numan >> >> >>> >>> >>>> >> >> >>> >>> >>>> |2025-02-08T14:27:57.462Z|00053| >> >> pinctrl(ovn_pinctrl0)|INFO| >> >> >>> >>> DHCPOFFER >> >> >>> >>> >>>> fa:16:3e:80:67:62 13.37.0.5 >> >> >>> >>> >>>> 2025-02-08T14:28:01.746Z|00054|pinctrl(ovn_pinctrl0)| >> >> INFO| >> >> >>> >>> DHCPOFFER >> >> >>> >>> >>>> fa:16:3e:80:67:62 13.37.0.5 >> >> >>> >>> >>>> 2025-02-08T14:28:05.582Z|00055|pinctrl(ovn_pinctrl0)| >> >> INFO| >> >> >>> >>> DHCPOFFER >> >> >>> >>> >>>> fa:16:3e:80:67:62 13.37.0.5 >> >> >>> >>> >>>> 2025-02-08T14:28:13.053Z|00056|pinctrl(ovn_pinctrl0)| >> >> INFO| >> >> >>> >>> DHCPOFFER >> >> >>> >>> >>>> fa:16:3e:80:67:62 13.37.0.5 >> >> >>> >>> >>>> 2025-02-08T14:28:21.972Z|00057|pinctrl(ovn_pinctrl0)| >> >> INFO| >> >> >>> >>> DHCPOFFER >> >> >>> >>> >>>> fa:16:3e:84:8e:7f 13.37.0.37 >> >> >>> >>> >>>> 2025-02-08T14:28:25.109Z|00058|pinctrl(ovn_pinctrl0)| >> >> INFO| >> >> >>> >>> DHCPOFFER >> >> >>> >>> >>>> fa:16:3e:84:8e:7f 13.37.0.37 >> >> >>> >>> >>>> 2025-02-08T14:28:29.362Z|00059|pinctrl(ovn_pinctrl0)| >> >> INFO| >> >> >>> >>> DHCPOFFER >> >> >>> >>> >>>> fa:16:3e:84:8e:7f 13.37.0.37 >> >> >>> >>> >>>> 2025-02-08T14:28:30.003Z|00060|pinctrl(ovn_pinctrl0)| >> >> INFO| >> >> >>> >>> DHCPOFFER >> >> >>> >>> >>>> fa:16:3e:80:67:62 13.37.0.5 >> >> >>> >>> >>>> 2025-02-08T14:28:37.820Z|00061|pinctrl(ovn_pinctrl0)| >> >> INFO| >> >> >>> >>> DHCPOFFER >> >> >>> >>> >>>> fa:16:3e:84:8e:7f 13.37.0.37 >> >> >>> >>> >>>> 2025-02-08T14:28:38.619Z|00002| >> >> >>> fatal_signal(ovn_statctrl2)|WARN| >> >> >>> >>> >>>> terminating >> >> >>> >>> >>>> with signal 15 (Terminated) # ovn_controller is >> >> restarted by >> >> >>> >>> >>>> docker restart ovn_controller >> >> >>> >>> >>>> 2025-02-08T14:28:50.639Z|00001|vlog|INFO|opened >> log file >> >> >>> >>> >>>> /var/log/kolla/openvswitch/ovn-controller.log >> >> >>> >>> >>>> 2025-02-08T14:28:50.640Z|00002|vif_plug_representor| >> >> >>> INFO|unable >> >> >>> >>> >>>> to set udev receive buffer size: Operation not >> permitted >> >> >>> >>> >>>> 2025-02-08T14:28:50.641Z|00003|reconnect|INFO| >> >> >>> tcp:127.0.0.1:6640 >> >> >>> >>> >>>> <http://127.0.0.1:6640>: connecting... >> >> >>> >>> >>>> 2025-02-08T14:28:50.641Z|00004|reconnect|INFO| >> >> >>> tcp:127.0.0.1:6640 >> >> >>> >>> >>>> <http://127.0.0.1:6640>: connected >> >> >>> >>> >>>> 2025-02-08T14:28:50.738Z|00005|main|INFO|OVN internal >> >> >>> version is >> >> >>> >>> >>>> : [24.09.0-20.37.0-73.6] >> >> >>> >>> >>>> 2025-02-08T14:28:50.738Z|00006|main|INFO|OVS IDL >> >> >>> reconnected, >> >> >>> >>> >>>> force recompute. >> >> >>> >>> >>>> 2025-02-08T14:28:50.738Z|00007|reconnect|INFO| >> >> >>> tcp:10.10.9.5:6642 >> >> >>> >>> >>>> <http://10.10.9.5:6642>: connecting... >> >> >>> >>> >>>> 2025-02-08T14:28:50.738Z|00008|main|INFO|OVNSB IDL >> >> >>> reconnected, >> >> >>> >>> >>>> force recompute. >> >> >>> >>> >>>> 2025-02-08T14:28:50.739Z|00009|ovn_util|INFO|statctrl: >> >> >>> >>> connecting >> >> >>> >>> >>>> to switch: "unix:/var/run/openvswitch/br-int.mgmt" >> >> >>> >>> >>>> 2025-02-08T14:28:50.739Z|00010|rconn|INFO|unix:/ >> var/run/ >> >> >>> >>> >>>> openvswitch/br-int.mgmt: >> >> >>> >>> >>>> connecting... >> >> >>> >>> >>>> 2025-02-08T14:28:50.739Z|00011|ovn_util|INFO|pinctrl: >> >> >>> connecting >> >> >>> >>> >>>> to switch: "unix:/var/run/openvswitch/br-int.mgmt" >> >> >>> >>> >>>> 2025-02-08T14:28:50.739Z|00012|rconn|INFO|unix:/ >> var/run/ >> >> >>> >>> >>>> openvswitch/br-int.mgmt: >> >> >>> >>> >>>> connecting... >> >> >>> >>> >>>> 2025-02-08T14:28:50.739Z|00013|reconnect|INFO| >> >> >>> tcp:10.10.9.5:6642 >> >> >>> >>> >>>> <http://10.10.9.5:6642>: connected >> >> >>> >>> >>>> 2025-02-08T14:28:50.739Z|00001|rconn(ovn_statctrl2)| >> >> >>> INFO|unix:/ >> >> >>> >>> >>>> var/run/openvswitch/br-int.mgmt: >> >> >>> >>> >>>> connected >> >> >>> >>> >>>> 2025-02-08T14:28:50.743Z|00001| >> rconn(ovn_pinctrl0)|INFO| >> >> >>> unix:/ >> >> >>> >>> >>>> var/run/openvswitch/br-int.mgmt: >> >> >>> >>> >>>> connected 2025-02-08T14:28:50.756Z|00014|ovn_util| >> INFO| >> >> >>> features: >> >> >>> >>> >>>> connecting to switch: "unix:/var/run/openvswitch/br- >> >> >>> int.mgmt" >> >> >>> >>> >>>> 2025-02-08T14:28:50.757Z|00015|rconn|INFO|unix:/ >> var/run/ >> >> >>> >>> >>>> openvswitch/br-int.mgmt: >> >> >>> >>> >>>> connecting... 2025-02-08T14:28:50.757Z|00016|features| >> >> >>> INFO|OVS >> >> >>> >>> >>>> Feature: ct_zero_snat, state: supported >> >> >>> >>> >>>> 2025-02-08T14:28:50.757Z|00017|features|INFO|OVS >> Feature: >> >> >>> >>> >>>> ct_flush, state: supported >> >> >>> >>> >>>> 2025-02-08T14:28:50.757Z|00018|features|INFO|OVS >> Feature: >> >> >>> >>> >>>> dp_hash_l4_sym_support, state: supported >> >> >>> >>> >>>> 2025-02-08T14:28:50.757Z|00019|main|INFO|OVS >> feature set >> >> >>> >>> changed, >> >> >>> >>> >>>> force recompute. >> >> >>> >>> >>>> 2025-02-08T14:28:50.757Z|00020|ovn_util|INFO|ofctrl: >> >> >>> connecting >> >> >>> >>> >>>> to switch: "unix:/var/run/openvswitch/br-int.mgmt" >> >> >>> >>> >>>> 2025-02-08T14:28:50.757Z|00021|rconn|INFO|unix:/ >> var/run/ >> >> >>> >>> >>>> openvswitch/br-int.mgmt: >> >> >>> >>> >>>> connecting... >> >> >>> >>> >>>> 2025-02-08T14:28:50.758Z|00022|rconn|INFO|unix:/ >> var/run/ >> >> >>> >>> >>>> openvswitch/br-int.mgmt: >> >> >>> >>> >>>> connected 2025-02-08T14:28:50.758Z|00023|main|INFO|OVS >> >> >>> OpenFlow >> >> >>> >>> >>>> connection reconnected,force recompute. >> >> >>> >>> >>>> 2025-02-08T14:28:50.759Z|00024|rconn|INFO|unix:/ >> var/run/ >> >> >>> >>> >>>> openvswitch/br-int.mgmt: >> >> >>> >>> >>>> connected 2025-02-08T14:28:50.760Z|00025|features| >> >> INFO|OVS >> >> >>> >>> >>>> Feature: meter_support, state: supported >> >> >>> >>> >>>> 2025-02-08T14:28:50.760Z|00026|features|INFO|OVS >> Feature: >> >> >>> >>> >>>> group_support, state: supported >> >> >>> >>> >>>> 2025-02-08T14:28:50.760Z|00027|features|INFO|OVS >> Feature: >> >> >>> >>> >>>> sample_action_with_registers, state: supported >> >> >>> >>> >>>> 2025-02-08T14:28:50.760Z|00028|main|INFO|OVS >> feature set >> >> >>> >>> changed, >> >> >>> >>> >>>> force recompute. >> >> >>> >>> >>>> 2025-02-08T14:28:51.483Z|00029|binding|INFO|Releasing >> >> lport >> >> >>> >>> >>>> 978a923d-1076-484b-a1d5-ff5827ae69ec from this chassis >> >> >>> >>> >>>> (sb_readonly=0) >> >> >>> >>> >>>> 2025-02-08T14:28:53.658Z|00030|poll_loop|INFO|wakeup >> >> due to >> >> >>> >>> >>>> [POLLIN] on fd 30 (10.10.9.3:60902 >> >> >>> >>> >>>> <http://10.10.9.3:60902><->10.10.9.5:6642 >> >> >>> >>> >>>> <http://10.10.9.5:6642>) at lib/stream-fd.c:157 >> (78% CPU >> >> >>> usage) >> >> >>> >>> >>>> 2025-02-08T14:28:54.726Z|00002|pinctrl(ovn_pinctrl0)| >> >> INFO| >> >> >>> >>> DHCPOFFER >> >> >>> >>> >>>> fa:16:3e:84:8e:7f 13.37.0.37 >> >> >>> >>> >>>> 2025-02-08T14:28:59.134Z|00003|pinctrl(ovn_pinctrl0)| >> >> INFO| >> >> >>> >>> DHCPACK >> >> >>> >>> >>>> fa:16:3e:84:8e:7f 13.37.0.37 >> >> >>> >>> >>>> 2025-02-08T14:29:01.499Z|00004|pinctrl(ovn_pinctrl0)| >> >> INFO| >> >> >>> >>> DHCPOFFER >> >> >>> >>> >>>> fa:16:3e:80:67:62 13.37.0.5 >> >> >>> >>> >>>> 2025-02-08T14:29:03.245Z|00005|pinctrl(ovn_pinctrl0)| >> >> INFO| >> >> >>> >>> DHCPACK >> >> >>> >>> >>>> fa:16:3e:84:8e:7f 13.37.0.37 >> >> >>> >>> >>>> 2025-02-08T14:29:03.455Z|00031|memory|INFO|98744 >> kB peak >> >> >>> >>> resident >> >> >>> >>> >>>> set size after 12.8 seconds >> >> >>> >>> >>>> 2025-02-08T14:29:03.455Z|00032|memory|INFO|idl-cells- >> >> >>> >>> >>>> OVN_Southbound:168023 >> >> >>> >>> >>>> idl-cells-Open_vSwitch:36874 lflow-cache-entries- >> cache- >> >> >>> expr:6494 >> >> >>> >>> >>>> lflow-cache-entries-cache-matches:1413 lflow- >> cache-size- >> >> >>> KB:20726 >> >> >>> >>> >>>> local_datapath_usage-KB:1 ofctrl_desired_flow_usage- >> >> KB:6342 >> >> >>> >>> >>>> ofctrl_installed_flow_usage-KB:4695 >> >> >>> ofctrl_sb_flow_ref_usage- >> >> >>> >>> >>>> KB:2355 | >> >> >>> >>> >>>> >> >> >>> >>> >>>> Any suggestions? >> >> >>> >>> >>>> >> >> >>> >>> >>>> Environment: >> >> >>> >>> >>>> OpenStack 2024.2 >> >> >>> >>> >>>> Kernel 6.8.0-50-generic #51 >> >> >>> >>> >>>> <https://github.com/ovn-org/ovn/issues/51>~22.04.1- >> >> >>> Ubuntu SMP >> >> >>> >>> >>>> PREEMPT_DYNAMIC >> >> >>> >>> >>>> >> >> >>> >>> >>>> |ovs-vsctl (Open vSwitch) 3.4.0 DB Schema 8.7.0 | >> >> >>> >>> >>>> >> >> >>> >>> >>>> |ovs-ofctl (Open vSwitch) 3.4.0 OpenFlow versions >> >> 0x1:0x6 | >> >> >>> >>> >>>> >> >> >>> >>> >>>> |# ovn-nbctl --version ovn-nbctl 24.09.0 Open vSwitch >> >> >>> Library >> >> >>> >>> >>>> 3.4.0 DB Schema 7.6.0 | >> >> >>> >>> >>>> >> >> >>> >>> >>>> |# ovn-sbctl --version ovn-sbctl 24.09.0 Open vSwitch >> >> >>> Library >> >> >>> >>> >>>> 3.4.0 DB Schema 20.37.0| >> >> >>> >>> >>>> >> >> >>> >>> > Thanks, >> >> >>> >>> > Dumitru >> >> >>> >>> > >> >> >> >> Thanks, >> >> Dumitru >> >> >> > >> > >> > >> > >> > _______________________________________________ >> > discuss mailing list >> > disc...@openvswitch.org >> > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss >> > > > _______________________________________________ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss