On 4/6/25 8:37 PM, Ilia Baikov wrote:
> Hello,

Hi Ilia,

> I've compiled OVN using provided branch with both patches included and
> migrated some VMs to host with L3 networking to see what going to happen.
> 
> Resubmit logs are now back. Interesting thing that there was about 270
> VMs before so logs appears back on about 300 instances (as well as
> ports). There was no resubmit logs from 1 of April. So count of ports
> related to this case.
> 
> 2025-04-06T18:09:36.797Z|00128|ofproto_dpif_xlate(handler20)|WARN|over
> 4096 resubmit actions on bridge br-int while processing
> udp,in_port=683,vlan_tci=0x0000,dl_src=fa:16:3e:27:0d:b2,dl_dst=01:00:5e:7f:ff:fa,nw_src=89.169.15.21,nw_dst=239.255.255.250,nw_tos=184,nw_ecn=0,nw_ttl=1,nw_frag=no,tp_src=56512,tp_dst=1900
> 2025-04-06T18:09:46.800Z|00123|ofproto_dpif_xlate(handler56)|WARN|over
> 4096 resubmit actions on bridge br-int while processing
> udp,in_port=683,vlan_tci=0x0000,dl_src=fa:16:3e:27:0d:b2,dl_dst=01:00:5e:7f:ff:fa,nw_src=89.169.15.21,nw_dst=239.255.255.250,nw_tos=184,nw_ecn=0,nw_ttl=1,nw_frag=no,tp_src=56512,tp_dst=1900
> 2025-04-06T18:09:46.808Z|00139|ofproto_dpif_xlate(handler64)|WARN|over
> 4096 resubmit actions on bridge br-int while processing
> udp,in_port=541,vlan_tci=0x0000,dl_src=fa:16:3e:57:59:7d,dl_dst=01:00:5e:7f:ff:fa,nw_src=89.169.15.68,nw_dst=239.255.255.250,nw_tos=0,nw_ecn=0,nw_ttl=2,nw_frag=no,tp_src=55936,tp_dst=1900
> 2025-04-06T18:09:51.225Z|00020|ofproto_dpif_xlate(handler49)|WARN|over
> 4096 resubmit actions on bridge br-int while processing
> udp,in_port=685,vlan_tci=0x0000,dl_src=fa:16:3e:4e:86:32,dl_dst=ff:ff:ff:ff:ff:ff,nw_src=89.169.15.31,nw_dst=89.169.15.255,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=138,tp_dst=138
> 2025-04-06T18:09:51.229Z|00021|ofproto_dpif_xlate(handler49)|WARN|over
> 4096 resubmit actions on bridge br-int while processing
> udp,in_port=685,vlan_tci=0x0000,dl_src=fa:16:3e:4e:86:32,dl_dst=ff:ff:ff:ff:ff:ff,nw_src=89.169.15.31,nw_dst=89.169.15.255,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=138,tp_dst=138
> 2025-04-06T18:10:39.616Z|2159336|ofproto_dpif_xlate|WARN|over 4096
> resubmit actions on bridge br-int while processing
> arp,in_port=CONTROLLER,vlan_tci=0x0000,dl_src=fa:16:3e:63:3e:92,dl_dst=ff:ff:ff:ff:ff:ff,arp_spa=185.255.178.116,arp_tpa=185.255.178.116,arp_op=1,arp_sha=fa:16:3e:63:3e:92,arp_tha=00:00:00:00:00:00
> 2025-04-06T18:33:24.414Z|00125|ofproto_dpif_xlate(handler62)|WARN|over
> 4096 resubmit actions on bridge br-int while processing
> udp,in_port=473,vlan_tci=0x0000,dl_src=fa:16:3e:66:0c:6e,dl_dst=ff:ff:ff:ff:ff:ff,nw_src=83.217.210.206,nw_dst=255.255.255.255,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=5678,tp_dst=5678
> 2025-04-06T18:33:54.406Z|00102|ofproto_dpif_xlate(handler26)|WARN|over
> 4096 resubmit actions on bridge br-int while processing
> udp,in_port=473,vlan_tci=0x0000,dl_src=fa:16:3e:66:0c:6e,dl_dst=ff:ff:ff:ff:ff:ff,nw_src=83.217.210.206,nw_dst=255.255.255.255,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=5678,tp_dst=5678
> 2025-04-06T18:34:00.917Z|00088|ofproto_dpif_xlate(handler48)|WARN|over
> 4096 resubmit actions on bridge br-int while processing
> udp,in_port=685,vlan_tci=0x0000,dl_src=fa:16:3e:4e:86:32,dl_dst=ff:ff:ff:ff:ff:ff,nw_src=89.169.15.31,nw_dst=89.169.15.255,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=138,tp_dst=138
> 2025-04-06T18:34:00.918Z|00089|ofproto_dpif_upcall(handler48)|WARN|
> Dropped 2 log messages in last 37 seconds (most recently, 7 seconds ago)
> due to excessive rate
> 2025-04-06T18:34:00.918Z|00090|ofproto_dpif_upcall(handler48)|WARN|Flow:
> udp,in_port=46,vlan_tci=0x0000,dl_src=fa:16:3e:4e:86:32,dl_dst=ff:ff:ff:ff:ff:ff,nw_src=89.169.15.31,nw_dst=89.169.15.255,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=138,tp_dst=138
> 2025-04-06T18:34:00.920Z|00091|ofproto_dpif_xlate(handler48)|WARN|over
> 4096 resubmit actions on bridge br-int while processing
> udp,in_port=685,vlan_tci=0x0000,dl_src=fa:16:3e:4e:86:32,dl_dst=ff:ff:ff:ff:ff:ff,nw_src=89.169.15.31,nw_dst=89.169.15.255,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=138,tp_dst=138
> 2025-04-06T18:34:24.407Z|00127|ofproto_dpif_xlate(handler7)|WARN|over
> 4096 resubmit actions on bridge br-int while processing
> udp,in_port=473,vlan_tci=0x0000,dl_src=fa:16:3e:66:0c:6e,dl_dst=ff:ff:ff:ff:ff:ff,nw_src=83.217.210.206,nw_dst=255.255.255.255,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=5678,tp_dst=5678
> 2025-04-06T18:35:11.952Z|00252|ofproto_dpif_xlate(handler38)|WARN|
> Dropped 1 log messages in last 18 seconds (most recently, 18 seconds
> ago) due to excessive rate
> 2025-04-06T18:35:11.952Z|00253|ofproto_dpif_xlate(handler38)|WARN|over
> 4096 resubmit actions on bridge br-int while processing
> udp,in_port=541,vlan_tci=0x0000,dl_src=fa:16:3e:57:59:7d,dl_dst=01:00:5e:7f:ff:fa,nw_src=89.169.15.68,nw_dst=239.255.255.250,nw_tos=0,nw_ecn=0,nw_ttl=2,nw_frag=no,tp_src=60210,tp_dst=1900
> 2025-04-06T18:35:11.952Z|00254|ofproto_dpif_upcall(handler38)|WARN|
> Dropped 3 log messages in last 71 seconds (most recently, 18 seconds
> ago) due to excessive rate
> 2025-04-06T18:35:11.953Z|00255|ofproto_dpif_upcall(handler38)|WARN|Flow:
> udp,in_port=346,vlan_tci=0x0000,dl_src=fa:16:3e:57:59:7d,dl_dst=01:00:5e:7f:ff:fa,nw_src=89.169.15.68,nw_dst=239.255.255.250,nw_tos=0,nw_ecn=0,nw_ttl=2,nw_frag=no,tp_src=60210,tp_dst=1900
> 

Out of these logs, the UDP ones are for IP multicast/broadcast traffic,
those will still get flooded and potentially hit the resubmit limit -
none of the changes we tried tackle that.  I also don't think that
should cause much issues (aside from the vswitchd logs) though.

The only other packet I see logged there is:

2025-04-06T18:10:39.616Z|2159336|ofproto_dpif_xlate|WARN|over 4096
resubmit actions on bridge br-int while processing
arp,in_port=CONTROLLER,vlan_tci=0x0000,dl_src=fa:16:3e:63:3e:92,dl_dst=ff:ff:ff:ff:ff:ff,arp_spa=185.255.178.116,arp_tpa=185.255.178.116,arp_op=1,arp_sha=fa:16:3e:63:3e:92,arp_tha=00:00:00:00:00:00
2025-04-06T18:33:24.414Z|00125|ofproto_dpif_xlate(handler62)

This is a GARP (originated by ovn-controller) and in theory should not
be flooded with the change from
https://github.com/dceara/ovn/commits/refs/heads/tmp-branch-24.09-revert-mg-split/.

Just to double check, have you also upgraded ovn-northd to the patched
version?  That's required because the changes in the last commit on my
branch are for ovn-northd.

Thanks,
Dumitru

> # ovn-controller --version (not using
> ovn-controller 24.09.3
> Open vSwitch Library 3.4.2
> OpenFlow versions 0x6:0x6
> SB DB Schema 20.37.0
> 
> # ovn-northd --version
> ovn-northd 24.09.3
> Open vSwitch Library 3.4.2
> 
> On 02.04.2025 10:52, Dumitru Ceara wrote:
>> On 4/1/25 7:30 PM, Ilia Baikov wrote:
>>> Hi Dumitru,
>> Hi Ilia,
>>
>>> Sure, let's give it a try. Is it good idea to apply this patch on top of
>>> the patch you previously sent to try?
>> Yes, you're right, we should apply it on top of that.  I pushed both
>> patches here:
>>
>> https://github.com/dceara/ovn/commits/refs/heads/tmp-branch-24.09-
>> revert-mg-split/
>>
>> Please ignore the CI failures, like mentioned yesterday the last commit
>> I did is known to break stuff, it's just to confirm that our problem in
>> this case is due to the MC_FLOOD_L2 output we do for ARP requests
>> targetting OVN router IPs.
>>
>>> What topology for testing is more preferrable to try it out? L2 or L3
>>> (bgp-based networking)? In case of L3 there is actually small amount of
>>> ARP pps compared to L2 where uplink interface (like eno1) is member of
>>> br-ex bridge.
>>>
>> Would it be possible to try both?
>>
>>> Thank you for your involvement, really appreciate it!
>>>
>> No problem, thanks for taking time to help us try things out!
>>
>> Regards,
>> Dumitru
>>
>>> 01.04.2025 15:06, Dumitru Ceara wrote:
>>>> On 4/1/25 3:15 AM, Ilia Baikov wrote:
>>>>> Hello,
>>>> Hi Ilia,
>>>>
>>>>> So the things go way deeper and it becomes way strange as i initially
>>>>> thought.
>>>>> I've migrated to L3 networking using ovn-bgp-agent, in order to reduce
>>>>> ARP packets flooded over all ports attached to br-int. However this
>>>>> didn't help at all and some VMs loses external connectivity (but VMs
>>>>> ports didn't get flooded by hundreds pps of ARP).
>>>>>
>>>>> 2025-04-01T01:03:51.368Z|00035|ofproto_dpif_xlate(handler12)|WARN|
>>>>> Dropped 131 log messages in last 55 seconds (most recently, 10 seconds
>>>>> ago) due to excessive rate
>>>>> 2025-04-01T01:03:51.368Z|00036|ofproto_dpif_xlate(handler12)|WARN|over
>>>>> 4096 resubmit actions on bridge br-int while processing
>>>>> udp,in_port=473,vlan_tci=0x0000,dl_src=fa:16:3e:66:0c:6e,dl_dst=ff:ff:ff:ff:ff:ff,nw_src=83.217.210.206,nw_dst=255.255.255.255,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=5678,tp_dst=5678
>>>>> 2025-04-01T01:04:35.392Z|00054|ofproto_dpif_upcall(handler60)|WARN|
>>>>> Dropped 133 log messages in last 57 seconds (most recently, 13 seconds
>>>>> ago) due to excessive rate
>>>>> 2025-04-01T01:04:35.392Z|00055|ofproto_dpif_upcall(handler60)|WARN|
>>>>> Flow:
>>>>> udp,in_port=277,vlan_tci=0x0000,dl_src=fa:16:3e:24:f1:f7,dl_dst=ff:ff:ff:ff:ff:ff,nw_src=83.217.210.148,nw_dst=83.217.210.255,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=137,tp_dst=137
>>>>>
>>>>> bridge("br-int")
>>>>> ----------------
>>>>>    0. in_port=277, priority 100, cookie 0xa70f8aad
>>>>>       set_field:0x5d/0xffff->reg13
>>>>>       set_field:0x2->reg11
>>>>>       set_field:0x3->reg12
>>>>>       set_field:0x2->metadata
>>>>>       set_field:0x2ac->reg14
>>>>>       set_field:0/0xffff0000->reg13
>>>>>       resubmit(,8)
>>>>>    8. metadata=0x2, priority 50, cookie 0xecab0c71
>>>>>       set_field:0/0x1000->reg10
>>>>>       resubmit(,73)
>>>>>       73. reg14=0x2ac,metadata=0x2, priority 80, cookie 0xa70f8aad
>>>>>               set_field:0x1000/0x1000->reg10
>>>>>       move:NXM_NX_REG10[12]->NXM_NX_XXREG0[111]
>>>>>        -> NXM_NX_XXREG0[111] is now 0x1
>>>>>       resubmit(,9)
>>>>>    9. reg0=0x8000/0x8000,metadata=0x2, priority 50, cookie 0x1a158a65
>>>>>       drop
>>>>>
>>>>> Final flow:
>>>>> udp,reg0=0x8000,reg10=0x1000,reg11=0x2,reg12=0x3,reg13=0x5d,reg14=0x2ac,metadata=0x2,in_port=277,vlan_tci=0x0000,dl_src=fa:16:3e:24:f1:f7,dl_dst=ff:ff:ff:ff:ff:ff,nw_src=83.217.210.148,nw_dst=83.217.210.255,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=137,tp_dst=137
>>>>> Megaflow:
>>>>> recirc_id=0,eth,ip,in_port=277,dl_src=fa:16:3e:24:f1:f7,nw_frag=no
>>>>> Datapath actions: drop
>>>>> 2025-04-01T01:04:48.434Z|00015|ofproto_dpif_xlate(handler58)|WARN|
>>>>> Dropped 7 log messages in last 55 seconds (most recently, 13 seconds
>>>>> ago) due to excessive rate
>>>>> 2025-04-01T01:04:48.435Z|00016|ofproto_dpif_xlate(handler58)|WARN|over
>>>>> 4096 resubmit actions on bridge br-int while processing
>>>>> arp,in_port=487,vlan_tci=0x0000,dl_src=fa:16:3e:de:2a:ce,dl_dst=ff:ff:ff:ff:ff:ff,arp_spa=185.255.178.119,arp_tpa=185.255.178.1,arp_op=1,arp_sha=fa:16:3e:de:2a:ce,arp_tha=00:00:00:00:00:00
>>>>> 2025-04-01T01:05:51.377Z|00017|ofproto_dpif_xlate(handler46)|WARN|
>>>>> Dropped 7 log messages in last 61 seconds (most recently, 29 seconds
>>>>> ago) due to excessive rate
>>>>> 2025-04-01T01:05:51.377Z|00018|ofproto_dpif_xlate(handler46)|WARN|over
>>>>> 4096 resubmit actions on bridge br-int while processing
>>>>> udp,in_port=473,vlan_tci=0x0000,dl_src=fa:16:3e:66:0c:6e,dl_dst=ff:ff:ff:ff:ff:ff,nw_src=83.217.210.206,nw_dst=255.255.255.255,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=5678,tp_dst=5678
>>>>> 2025-04-01T01:05:51.377Z|00019|ofproto_dpif_upcall(handler46)|WARN|
>>>>> Dropped 8 log messages in last 63 seconds (most recently, 29 seconds
>>>>> ago) due to excessive rate
>>>>> 2025-04-01T01:05:51.377Z|00020|ofproto_dpif_upcall(handler46)|WARN|
>>>>> Flow:
>>>>> udp,in_port=278,vlan_tci=0x0000,dl_src=fa:16:3e:66:0c:6e,dl_dst=ff:ff:ff:ff:ff:ff,nw_src=83.217.210.206,nw_dst=255.255.255.255,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=5678,tp_dst=5678
>>>>>
>>>>> bridge("br-int")
>>>>> ----------------
>>>>>    0. in_port=278, priority 100, cookie 0xd191f4de
>>>>>       set_field:0x5e/0xffff->reg13
>>>>>       set_field:0x2->reg11
>>>>>       set_field:0x3->reg12
>>>>>       set_field:0x2->metadata
>>>>>       set_field:0x33d->reg14
>>>>>       set_field:0/0xffff0000->reg13
>>>>>       resubmit(,8)
>>>>>    8. metadata=0x2, priority 50, cookie 0xecab0c71
>>>>>       set_field:0/0x1000->reg10
>>>>>       resubmit(,73)
>>>>>       73. reg14=0x33d,metadata=0x2, priority 80, cookie 0xd191f4de
>>>>>               set_field:0x1000/0x1000->reg10
>>>>>       move:NXM_NX_REG10[12]->NXM_NX_XXREG0[111]
>>>>>        -> NXM_NX_XXREG0[111] is now 0x1
>>>>>       resubmit(,9)
>>>>>    9. reg0=0x8000/0x8000,metadata=0x2, priority 50, cookie 0x1a158a65
>>>>>       drop
>>>>>
>>>>> Final flow:
>>>>> udp,reg0=0x8000,reg10=0x1000,reg11=0x2,reg12=0x3,reg13=0x5e,reg14=0x33d,metadata=0x2,in_port=278,vlan_tci=0x0000,dl_src=fa:16:3e:66:0c:6e,dl_dst=ff:ff:ff:ff:ff:ff,nw_src=83.217.210.206,nw_dst=255.255.255.255,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=5678,tp_dst=5678
>>>>> Megaflow:
>>>>> recirc_id=0,eth,ip,in_port=278,dl_src=fa:16:3e:66:0c:6e,nw_frag=no
>>>>> Datapath actions: drop
>>>>>
>>>>> When i'm seeing this issue on mentioned IPs i can see that VM tries to
>>>>> resolve default gateway mac address but with no success since ARP is
>>>>> dropped somewhere because of resubmits and drop action on datapath. I
>>>>> guess now this is not related to OVN controller. Meanwhile DHCP works
>>>>> pretty good :)
>>>> Glad to hear about DHCP.
>>>>
>>>>> Option broadcast-arps-to-all-routers is disabled at both of provider
>>>>> network Logical Switches.
>>>>>
>>>>> Could there be any solution for this?
>>>> It's not a solution, more of a change to confirm whether this is
>>>> causing
>>>> issues, but could you please try with this commit?
>>>>
>>>> https://github.com/dceara/ovn/commit/
>>>> fac23e2f6ef6effe3f9a2e0310e78d085750488b
>>>>
>>>> The commit disables flooding of ARP requests that target OVN owned
>>>> router IPs to non-router ports.  It's not something that can be
>>>> accepted
>>>> as is because it breaks some other things, e.g., OVN generated GARP
>>>> requests will not be forwarded properly.
>>>>
>>>> The patch applies on top of the ovn main branch.  If you want to try it
>>>> out on older branches the easiest way is to just ignore the test
>>>> changes.
>>>>
>>>> Looking forward to hear how it went.
>>>>
>>>> Thanks,
>>>> Dumitru
>>>>
>>>>> Regards,
>>>>> Ilia Baikov
>>>>>
>>>>> 26.03.2025 12:34, Ilia Baikov пишет:
>>>>>> Got my hands on this, back to debugging. Seems like kernel runs
>>>>>> stable
>>>>>> # uname -r
>>>>>> 6.14.0-061400-generic
>>>>>> Meanwhile there is no unrecognized(27) related logs.
>>>>>> tail -f /var/log/kolla/openvswitch/ovn-controller.log | grep -i
>>>>>> "dhcp"
>>>>>> 2025-03-26T09:23:08.086Z|38050|pinctrl(ovn_pinctrl0)|INFO|DHCPACK
>>>>>> fa:16:3e:9c:f4:45 185.255.178.131
>>>>>> 2025-03-26T09:23:08.086Z|38052|pinctrl(ovn_pinctrl0)|DBG|pinctrl
>>>>>> received  packet-in | opcode=PUT_DHCP_OPTS| OF_Table_ID=0|
>>>>>> OF_Cookie_ID=0xfb6fb11d| in-port=5| src-mac=fa:16:3e:9c:f4:45, dst-
>>>>>> mac=ff:ff:ff:ff:ff:ff| src-ip=0.0.0.0, dst-ip=255.255.255.255
>>>>>> 2025-03-26T09:23:11.084Z|38054|pinctrl(ovn_pinctrl0)|INFO|DHCPACK
>>>>>> fa:16:3e:9c:f4:45 185.255.178.131
>>>>>> 2025-03-26T09:23:11.085Z|38056|pinctrl(ovn_pinctrl0)|DBG|pinctrl
>>>>>> received  packet-in | opcode=PUT_DHCP_OPTS| OF_Table_ID=0|
>>>>>> OF_Cookie_ID=0xfb6fb11d| in-port=5| src-mac=fa:16:3e:9c:f4:45, dst-
>>>>>> mac=ff:ff:ff:ff:ff:ff| src-ip=0.0.0.0, dst-ip=255.255.255.255
>>>>>> 2025-03-26T09:23:26.606Z|38058|pinctrl(ovn_pinctrl0)|INFO|DHCPACK
>>>>>> fa:16:3e:9c:f4:45 185.255.178.131
>>>>>> 2025-03-26T09:23:26.606Z|38060|pinctrl(ovn_pinctrl0)|DBG|pinctrl
>>>>>> received  packet-in | opcode=PUT_DHCP_OPTS| OF_Table_ID=0|
>>>>>> OF_Cookie_ID=0xfb6fb11d| in-port=5| src-mac=fa:16:3e:9c:f4:45, dst-
>>>>>> mac=ff:ff:ff:ff:ff:ff| src-ip=0.0.0.0, dst-ip=255.255.255.255
>>>>>> 2025-03-26T09:23:27.704Z|38062|pinctrl(ovn_pinctrl0)|INFO|DHCPACK
>>>>>> fa:16:3e:9c:f4:45 185.255.178.131
>>>>>> 2025-03-26T09:23:27.704Z|38064|pinctrl(ovn_pinctrl0)|DBG|pinctrl
>>>>>> received  packet-in | opcode=PUT_DHCP_OPTS| OF_Table_ID=0|
>>>>>> OF_Cookie_ID=0xfb6fb11d| in-port=5| src-mac=fa:16:3e:9c:f4:45, dst-
>>>>>> mac=ff:ff:ff:ff:ff:ff| src-ip=0.0.0.0, dst-ip=255.255.255.255
>>>>>> 2025-03-26T09:23:28.383Z|38066|pinctrl(ovn_pinctrl0)|INFO|DHCPACK
>>>>>> fa:16:3e:9c:f4:45 185.255.178.131
>>>>>> 2025-03-26T09:23:28.383Z|38068|pinctrl(ovn_pinctrl0)|DBG|pinctrl
>>>>>> received  packet-in | opcode=PUT_DHCP_OPTS| OF_Table_ID=0|
>>>>>> OF_Cookie_ID=0xfb6fb11d| in-port=5| src-mac=fa:16:3e:9c:f4:45, dst-
>>>>>> mac=ff:ff:ff:ff:ff:ff| src-ip=0.0.0.0, dst-ip=255.255.255.255
>>>>>> 2025-03-26T09:23:53.984Z|38070|pinctrl(ovn_pinctrl0)|INFO|DHCPACK
>>>>>> fa:16:3e:50:22:c4 185.255.178.170
>>>>>> 2025-03-26T09:23:53.984Z|38072|pinctrl(ovn_pinctrl0)|DBG|pinctrl
>>>>>> received  packet-in | opcode=PUT_DHCP_OPTS| OF_Table_ID=0|
>>>>>> OF_Cookie_ID=0xa020594| in-port=184| src-mac=fa:16:3e:50:22:c4, dst-
>>>>>> mac=30:b6:4f:5f:db:a0| src-ip=185.255.178.170, dst-ip=185.255.178.1
>>>>>> 2025-03-26T09:24:51.866Z|38074|pinctrl(ovn_pinctrl0)|INFO|DHCPACK
>>>>>> fa:16:3e:18:ac:4a 89.169.15.224
>>>>>> 2025-03-26T09:24:51.866Z|38076|pinctrl(ovn_pinctrl0)|DBG|pinctrl
>>>>>> received  packet-in | opcode=PUT_DHCP_OPTS| OF_Table_ID=0|
>>>>>> OF_Cookie_ID=0xa020594| in-port=156| src-mac=fa:16:3e:18:ac:4a, dst-
>>>>>> mac=30:b6:4f:5f:db:a0| src-ip=89.169.15.224, dst-ip=89.169.15.1
>>>>>>
>>>>>> And yes, there is logs about resubmit actions which are expected as
>>>>>> you said.
>>>>>> ->reg15
>>>>>>    continuation.actions=unroll_xlate(table=0, cookie=0),resubmit(,32)
>>>>>>    continuation.actions=unroll_xlate(table=0, cookie=0),resubmit(,32)
>>>>>> OFPT_FLOW_MOD (OF1.5) (xid=0x235760): ADD table:8
>>>>>> priority=110,icmp6,reg10=0x10000/0x10000,reg15=0x28a,metadata=0x2,dl_src=fa:16:3e:48:61:ad,icmp_type=2,icmp_code=0
>>>>>>  cookie:0xd1588295 
>>>>>> actions=push:NXM_NX_REG14[],push:NXM_NX_REG15[],pop:NXM_NX_REG14[],pop:NXM_NX_REG15[],resubmit(,9)
>>>>>> OFPT_FLOW_MOD (OF1.5) (xid=0x235761): ADD table:8
>>>>>> priority=110,icmp,reg10=0x10000/0x10000,reg15=0x28a,metadata=0x2,dl_src=fa:16:3e:48:61:ad,icmp_type=3,icmp_code=4
>>>>>>  cookie:0xd1588295 
>>>>>> actions=push:NXM_NX_REG14[],push:NXM_NX_REG15[],pop:NXM_NX_REG14[],pop:NXM_NX_REG15[],resubmit(,9)
>>>>>> OFPT_FLOW_MOD (OF1.5) (xid=0x23577b): ADD table:80
>>>>>> priority=100,reg14=0x28a,metadata=0x2 cookie:0xa17d67d7
>>>>>> actions=set_field:0x9->reg11,set_field:0xa->reg12,resubmit(,8)
>>>>>> OFPT_FLOW_MOD (OF1.5) (xid=0x23577c): ADD table:43
>>>>>> priority=100,reg15=0x28a,metadata=0x2 cookie:0xa17d67d7
>>>>>> actions=set_field:0x1->reg15,resubmit(,43
>>>>>>
>>>>>> OFPT_PACKET_OUT (OF1.5) (xid=0x235622): in_port=CONTROLLER
>>>>>> actions=set_field:0x2->metadata,set_field:0x503-
>>>>>>> reg14,resubmit(CONTROLLER,8) data_len=42
>>>>>> OFPT_PACKET_OUT (OF1.5) (xid=0x235622): in_port=CONTROLLER
>>>>>> actions=set_field:0x2->metadata,set_field:0x503-
>>>>>>> reg14,resubmit(CONTROLLER,8) data_len=42
>>>>>>    continuation.actions=unroll_xlate(table=0, cookie=0),resubmit(,32)
>>>>>>    continuation.actions=unroll_xlate(table=0, cookie=0),resubmit(,32)
>>>>>>    continuation.actions=unroll_xlate(table=0, cookie=0),resubmit(,32)
>>>>>>    continuation.actions=unroll_xlate(table=0, cookie=0),resubmit(,32)
>>>>>>
>>>>>> top - 09:29:33 up 16:58,  3 users,  load average: 37.70, 37.39, 36.95
>>>>>> Threads: 114 total,  13 running, 101 sleeping,   0 stopped,   0
>>>>>> zombie
>>>>>> %Cpu(s): 27.5 us, 11.2 sy,  0.0 ni, 60.3 id,  0.5 wa,  0.0 hi, 0.6
>>>>>> si,  0.0 st
>>>>>> MiB Mem : 773901.8 total, 323685.3 free, 423865.0 used,  26351.4
>>>>>> buff/
>>>>>> cache
>>>>>> MiB Swap:   8192.0 total,   8192.0 free,      0.0 used. 345358.7
>>>>>> avail
>>>>>> Mem
>>>>>>
>>>>>> #top -H -p $(pidof ovs-vswitchd)
>>>>>>       PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM TIME+
>>>>>> COMMAND
>>>>>>      6212 root      20   0 8388104 662016   7940 R  22.3   0.1
>>>>>> 32:13.87
>>>>>> revalidator102
>>>>>>      6200 root      20   0 8388104 662016   7940 R  21.6   0.1
>>>>>> 73:01.76
>>>>>> revalidator90
>>>>>>      6202 root      20   0 8388104 662016   7940 R  20.6   0.1
>>>>>> 32:13.77
>>>>>> revalidator92
>>>>>>      6217 root      20   0 8388104 662016   7940 R  20.3   0.1
>>>>>> 26:07.50
>>>>>> revalidator107
>>>>>>      6214 root      20   0 8388104 662016   7940 R  18.9   0.1
>>>>>> 39:12.60
>>>>>> revalidator104
>>>>>>      6219 root      20   0 8388104 662016   7940 R  18.6   0.1
>>>>>> 25:57.96
>>>>>> revalidator109
>>>>>>      6211 root      20   0 8388104 662016   7940 R  17.6   0.1
>>>>>> 39:20.20
>>>>>> revalidator101
>>>>>>      6207 root      20   0 8388104 662016   7940 R  13.6   0.1
>>>>>> 17:53.31
>>>>>> revalidator97
>>>>>>      6204 root      20   0 8388104 662016   7940 R  13.0   0.1
>>>>>> 32:34.67
>>>>>> revalidator94
>>>>>>      6209 root      20   0 8388104 662016   7940 R  12.6   0.1
>>>>>> 35:55.88
>>>>>> revalidator100
>>>>>>      6213 root      20   0 8388104 662016   7940 R  12.6   0.1
>>>>>> 18:04.43
>>>>>> revalidator103
>>>>>>      6220 root      20   0 8388104 662016   7940 R  12.3   0.1
>>>>>> 5:52.71
>>>>>> revalidator110
>>>>>>      6218 root      20   0 8388104 662016   7940 R  12.0   0.1
>>>>>> 9:09.66
>>>>>> revalidator108
>>>>>>      6215 root      20   0 8388104 662016   7940 S   8.6   0.1
>>>>>> 22:22.27
>>>>>> revalidator105
>>>>>>      6221 root      20   0 8388104 662016   7940 S   8.6   0.1
>>>>>> 4:44.93
>>>>>> revalidator111
>>>>>>      6208 root      20   0 8388104 662016   7940 S   8.0   0.1
>>>>>> 22:55.17
>>>>>> revalidator98
>>>>>>      6203 root      20   0 8388104 662016   7940 S   6.0   0.1
>>>>>> 38:52.72
>>>>>> revalidator93
>>>>>>      6206 root      20   0 8388104 662016   7940 S   6.0   0.1
>>>>>> 22:11.55
>>>>>> revalidator96
>>>>>>      6216 root      20   0 8388104 662016   7940 S   4.3   0.1
>>>>>> 33:14.52
>>>>>> revalidator106
>>>>>>
>>>>>> Sometimes revalidator processes drops to 5-10%, sometimes to 30%. I
>>>>>> guess this behaviour is because of resubmit actions?. So far so good
>>>>>> controller feeling fine, but there could be some sort of freezes on
>>>>>> sending DHCPREPLY.
>>>>>>
>>>>>> top -H -p $(pidof ovn-controller)
>>>>>> top - 09:30:55 up 16:59,  3 users,  load average: 35.73, 36.70, 36.73
>>>>>> Threads:   5 total,   0 running,   5 sleeping,   0 stopped,   0
>>>>>> zombie
>>>>>> %Cpu(s): 27.7 us, 10.6 sy,  0.0 ni, 60.8 id,  0.3 wa,  0.0 hi, 0.6
>>>>>> si,  0.0 st
>>>>>> MiB Mem : 773901.8 total, 323758.4 free, 423785.6 used,  26357.8
>>>>>> buff/
>>>>>> cache
>>>>>> MiB Swap:   8192.0 total,   8192.0 free,      0.0 used. 345438.0
>>>>>> avail
>>>>>> Mem
>>>>>>
>>>>>>       PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM TIME+
>>>>>> COMMAND
>>>>>>      5045 root      20   0  396500  83836   4432 S   0.7   0.0
>>>>>> 11:46.59
>>>>>> ovn-controller
>>>>>>      5412 root      20   0  396500  83836   4432 S   0.0   0.0
>>>>>> 0:06.23
>>>>>> ovn_pinctrl0
>>>>>>      5413 root      20   0  396500  83836   4432 S   0.0   0.0
>>>>>> 0:00.00
>>>>>> urcu1
>>>>>>      5414 root      20   0  396500  83836   4432 S   0.0   0.0
>>>>>> 0:00.20
>>>>>> ovn_statctrl2
>>>>>>      6103 root      20   0  396500  83836   4432 S   0.0   0.0
>>>>>> 0:04.10
>>>>>> stopwatch3
>>>>>>
>>>>>>
>>>>>> Trying to reproduce on a real-world environment. There is 300
>>>>>> instances running with about 300Mbps network traffic in total.
>>>>>> Is there more logs or debug i can provide?
>>>>>>

_______________________________________________
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to