Hi all,

While creating ovn interconnect logical flows (Transit switch, LRPs
and LSPs) for a Logical Router in an AZ, there is load spike in all
chassis's ovn-controllers to 100%.
This remains for 1 minute and half and returns to normal after the
logical flows are installed.
Since this behavior only manifests itself while creating ovn
interconnect logical flows, enabling the debug logs shows that the
ovn-controller receives from SB the following change:

2024-12-26T17:38:15.023Z|10562646|inc_proc_eng|DBG|lib/inc-proc-eng.c:379:
node: SB_encap, old_state Unchanged, new_state Stale
2024-12-26T17:38:15.023Z|10562647|inc_proc_eng|DBG|lib/inc-proc-eng.c:379:
node: mff_ovn_geneve, old_state Unchanged, new_state Stale
2024-12-26T17:38:15.023Z|10562648|inc_proc_eng|DBG|lib/inc-proc-eng.c:379:
node: physical_flow_output, old_state Unchanged, new_state Stale
2024-12-26T17:38:15.023Z|10562649|inc_proc_eng|DBG|lib/inc-proc-eng.c:379:
node: flow_output, old_state Unchanged, new_state Stale
2024-12-26T17:38:15.032Z|10562650|jsonrpc|DBG|ssl:1XX.XX.XX.X16:6642:
received notification, method="update3",
params=[["monid","OVN_Southbound"],"9c54ec46-83a4-4deb-877d-3089a99ae482",{"SB_Global
":{"019c4225-e637-4881-b382-36fdfcb9e0c9":{"modify":{"nb_cfg":34448}}},"Port_Binding":{"4230d38a-75cd-4f1c-9fde-62e753958656":{"insert":{"up":false,"tunnel_key":7,"mac":"16:00:01:f4:03:cb
169.254 .30.3/24
fe80:30::3/64","datapath":["uuid","e84addb8-efdd-48bc-8454-8dd76a4a5a19"],"logical_port":"bd7bae47-2482-41cc-9486-e66f9d7b25bf-lrp-793b8","type":"patch"}},"f4d71615-88b8-4274-87ab-448cf8
c6a6ee":{"insert":{"up":false,"tunnel_key":6,"mac":"16:00:01:75:f8:87
169.254.10.1/24
fe80:10::1/64","datapath":["uuid","e84addb8-efdd-48bc-8454-8dd76a4a5a19"],"logical_port":"bd7bae47-2482-41cc-
9486-e66f9d7b25bf-lrp-ba453","type":"patch"}},"3eac57dd-386c-43b0-bf5b-8225941b52cb":{"insert":{"up":false,"tunnel_key":5,"mac":"16:00:01:9a:30:fa
169.254.40.4/24 fe80:40::4/64","datapath":["uuid
","e84addb8-efdd-48bc-8454-8dd76a4a5a19"],"logical_port":"bd7bae47-2482-41cc-9486-e66f9d7b25bf-lrp-36ac0","type":"patch"}},"7295d5c6-ff8d-4f4f-b1a5-eb1ca87fcb7d":{"insert":{"up":false,"tunnel_key
":8,"mac":"16:00:01:de:df:b5 169.254.20.2/24
fe80:20::2/64","datapath":["uuid","e84addb8-efdd-48bc-8454-8dd76a4a5a19"],"logical_port":"bd7bae47-2482-41cc-9486-e66f9d7b25bf-lrp-c21c6","type":"patc
h"}}},"Logical_Flow":{"224852f2-9a46-4569-b51d-5d340ed6f4b4":{"modify":{"logical_dp_group":["uuid","bd825723-9a0e-451c-b5ad-b7d6f9cc75e4"]}},"05f5d515-bccf-47c1-9d55-8a4dfce88822":{"insert":{"pip
eline":"ingress","match":"inport ==
\"bd7bae47-2482-41cc-9486-e66f9d7b25bf-lrp-36ac0\" && ip6.dst ==
fe80::/64","priority":194,"logical_datapath":["uuid","e84addb8-efdd-48bc-8454-8dd76a4a5a19"],"
table_id":13,"actions":"ip.ttl--; reg8[0..15] = 0; xxreg0 = ip6.dst;
xxreg1 = fe80::1400:1ff:fe9a:30fa; eth.src = 16:00:01:9a:30:fa;
outport = \"bd7bae47-2482-41cc-9486-e66f9d7b25bf-lrp-36ac0\";
flags.loopback = 1;
next;"}},"c67ec1ee-bfc1-47e0-b35c-95fcc2adc13b":{"modify":{"logical_dp_group":

The addresses from the above change (16:00:01:75:f8:87 169.254.10.1/24
fe80:10::1/64) are configured into the LRP connected to the transit
switch.

After while another incremental processing is started:

2024-12-26T17:38:18.035Z|11190795|coverage|INFO|Skipping details of
duplicate event coverage for hash=db09e857
2024-12-26T17:38:18.035Z|11190796|poll_loop|DBG|wakeup due to [POLLIN]
on fd 3 (1XX.XX.XX.X99:47382<->1XX.XX.XX.X16:6642) at
lib/stream-ssl.c:842 (100% CPU usage)
2024-12-26T17:38:18.035Z|11190797|inc_proc_eng|DBG|Initializing new
run 2024-12-26T17:38:18.035Z|11190798|inc_proc_eng|DBG|lib/inc-proc-eng.c:379:
node: SB_sb_global, old_state Updated, new_state Stale
2024-12-26T17:38:18.035Z|11190799|inc_proc_eng|DBG|lib/inc-proc-eng.c:379:
node: northd_options, old_state Unchanged, new_state Stale
2024-12-26T17:38:18.035Z|11190800|inc_proc_eng|DBG|lib/inc-proc-eng.c:379:
node: SB_dhcp_options, old_state Unchanged, new_state Stale
2024-12-26T17:38:18.035Z|11190801|inc_proc_eng|DBG|lib/inc-proc-eng.c:379:
node: SB_dhcpv6_options, old_state Unchanged, new_state Stale
2024-12-26T17:38:18.035Z|11190802|inc_proc_eng|DBG|lib/inc-proc-eng.c:379:
node: dhcp_options, old_state Unchanged, new_state Stale
2024-12-26T17:38:18.035Z|11190803|inc_proc_eng|DBG|lib/inc-proc-eng.c:379:
node: SB_address_set, old_state Unchanged, new_state Stale

In other chassis we observerd the recompute time increasing to 18secs:

2024-12-26T17:38:46.463Z|40096697|poll_loop|INFO|Dropped 72 log
messages in last 6 seconds (most recently, 0 seconds ago) due to
excessive rate 2024-12-26T17:38:46.463Z|40096698|poll_loop|INFO|wakeup
due to [POLLIN] on fd 26 (1XX.XX.XX.6:56350<->1XX.XX.XX.X16:6642) at
lib/stream-ssl.c:842 (99% CPU usage)
2024-12-26T17:38:52.442Z|40096699|poll_loop|INFO|Dropped 72 log
messages in last 6 seconds (most recently, 0 seconds ago) due to
excessive rate 2024-12-26T17:38:52.442Z|40096700|poll_loop|INFO|wakeup
due to [POLLIN] on fd 26 (1XX.XX.XX.6:56350<->1XX.XX.XX.X16:6642) at
lib/stream-ssl.c:842 (102% CPU usage)
2024-12-26T17:38:58.409Z|40096701|poll_loop|INFO|Dropped 72 log
messages in last 6 seconds (most recently, 0 seconds ago) due to
excessive rate 2024-12-26T17:38:58.409Z|40096702|poll_loop|INFO|wakeup
due to [POLLIN] on fd 26 (1XX.XX.XX.6:56350<->1XX.XX.XX.X16:6642) at
lib/stream-ssl.c:842 (101% CPU usage)
2024-12-26T17:39:04.477Z|40096703|poll_loop|INFO|Dropped 72 log
messages in last 6 seconds (most recently, 0 seconds ago) due to
excessive rate 2024-12-26T17:39:04.477Z|40096704|poll_loop|INFO|wakeup
due to [POLLIN] on fd 26 (1XX.XX.XX.6:56350<->1XX.XX.XX.X16:6642) at
lib/stream-ssl.c:842 (98% CPU usage)
2024-12-26T17:39:27.896Z|40096705|inc_proc_eng|INFO|node:
logical_flow_output, recompute (forced) took 18932ms
2024-12-26T17:39:28.612Z|40096706|inc_proc_eng|INFO|node:
physical_flow_output, recompute (forced) took 670ms
2024-12-26T17:39:30.614Z|40096707|timeval|WARN|Unreasonably long
22134ms poll interval (19780ms user, 2315ms system)
2024-12-26T17:39:30.614Z|40096708|timeval|WARN|faults: 2634558 minor,
0 major 2024-12-26T17:39:30.614Z|40096709|timeval|WARN|disk: 0 reads,
8 writes 2024-12-26T17:39:30.614Z|40096710|timeval|WARN|context
switches: 0 voluntary, 2688 involuntary

This behavior only happens with ovn interconnect logical flows, do you
know any known issue regarding it?

#Versions in use

$ ovn-nbctl -V
ovn-nbctl 23.03.1
Open vSwitch Library 3.1.3
DB Schema 7.0.0

$ ovs-vsctl -V
ovs-vsctl (Open vSwitch) 3.1.3
DB Schema 8.3.1

external_ids: ovn-monitor-all="true",
ovn-ofctrl-wait-before-clear="10000", ovn-openflow-probe-interval="0",
ovn-remote-probe-interval="180000"

Regards,

Tiago Pires

-- 




_‘Esta mensagem é direcionada apenas para os endereços constantes no 
cabeçalho inicial. Se você não está listado nos endereços constantes no 
cabeçalho, pedimos-lhe que desconsidere completamente o conteúdo dessa 
mensagem e cuja cópia, encaminhamento e/ou execução das ações citadas estão 
imediatamente anuladas e proibidas’._


* **‘Apesar do Magazine Luiza tomar 
todas as precauções razoáveis para assegurar que nenhum vírus esteja 
presente nesse e-mail, a empresa não poderá aceitar a responsabilidade por 
quaisquer perdas ou danos causados por esse e-mail ou por seus anexos’.*



_______________________________________________
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to