Currently for a given LRP the release and claim process works the
following way in ovn-controller

1. Priority update event is triggered by the Southbound or GTW
Chassis with higher priority comes up
2. LRP port is released from GTW Chassis X, this *assumes* that
another GTW Chassis will claim it based on the
`ovn/binding.c:consider_ha_lport()` method
3. LRP port is claimed by the GTW Chassis Y

During the last two events, there's no assurance that the LRP has
been claimed by another GTW Chassis, leading to potential downtime
if GTW Chassis Y fails to claim it. Thus the LRP remains inactive.

We observed this issue when a GTW Chassis gets rebooted without a
proper termination i.e `ovn-appctl exit()`, this results in higher
priority LRPs persisting for that GTW Chassis in the Southbound(not
rescheduled by CMS). And when the node comes back up again, the
lower priority GTW Chassis releases the LRP, but the restarted node
doesn't reclaim it, causing router outages. Below is a case we came
across in our environment, and at the bottom is a patch to remedy
this issue.

Here GTW06 had priority 5 for an LRP and GTW10 had priority 4, on
shutdown of the GTW06 the LRP was claimed by GTW10, which is fine.
But once GTW06 is restarted, we can see that the BFD session to
GTW10 is established(at 11:17:35), and thus GTW10 releases(at
11:17:42) the LRP but GTW06 never claims it. In total we had about
~240 active LRPs(priority 5) on GTW06 when it was rebooted which
it had to reclaim but never did.

```
gtw06-logs: 2023-09-20T11:17:19.244Z|00001|vlog|INFO|opened log file /dev/stdout
gtw06-logs: 2023-09-20T11:17:19Z|00001|vlog|INFO|opened log file /dev/stdout
gtw06-logs: 
2023-09-20T11:17:19Z|00003|reconnect|INFO|unix:/run/openvswitch/db.sock: 
connected
gtw06-logs: 
2023-09-20T11:17:19.245Z|00003|reconnect|INFO|unix:/run/openvswitch/db.sock: 
connected
gtw06-logs: 
2023-09-20T11:17:19Z|00002|reconnect|INFO|unix:/run/openvswitch/db.sock: 
connecting...
gtw06-logs: 
2023-09-20T11:17:19.245Z|00002|reconnect|INFO|unix:/run/openvswitch/db.sock: 
connecting...
gtw06-logs: 2023-09-20T11:17:19Z|00005|main|INFO|OVS IDL reconnected, force 
recompute.
gtw06-logs: 2023-09-20T11:17:19.247Z|00005|main|INFO|OVS IDL reconnected, force 
recompute.
gtw06-logs: 2023-09-20T11:17:19Z|00004|main|INFO|OVN internal version is : 
[23.03.1-20.27.0-70.6]
gtw06-logs: 2023-09-20T11:17:19.247Z|00004|main|INFO|OVN internal version is : 
[23.03.1-20.27.0-70.6]
gtw06-logs: 2023-09-20T11:17:19.248Z|00007|main|INFO|OVNSB IDL reconnected, 
force recompute.
gtw06-logs: 2023-09-20T11:17:19Z|00007|main|INFO|OVNSB IDL reconnected, force 
recompute.
gtw06-logs: 2023-09-20T11:17:19Z|00006|reconnect|INFO|ssl:x.x.x.x:6642: 
connecting...
gtw06-logs: 2023-09-20T11:17:19.248Z|00006|reconnect|INFO|ssl:x.x.x.x:6642: 
connecting...
gtw06-logs: 2023-09-20T11:17:19Z|00008|reconnect|INFO|ssl:x.x.x.x:6642: 
connected
gtw06-logs: 2023-09-20T11:17:19.258Z|00008|reconnect|INFO|ssl:x.x.x.x:6642: 
connected
gtw06-logs: 2023-09-20T11:17:29Z|00010|memory|INFO|idl-cells-Open_vSwitch:780
gtw06-logs: 
2023-09-20T11:17:29.244Z|00010|memory|INFO|idl-cells-Open_vSwitch:780
gtw06-logs: 2023-09-20T11:17:29Z|00009|memory|INFO|98216 kB peak resident set 
size after 10.0 seconds
gtw06-logs: 2023-09-20T11:17:29.244Z|00009|memory|INFO|98216 kB peak resident 
set size after 10.0 seconds
gtw06-logs: 
2023-09-20T11:17:35Z|00011|features|INFO|unix:/var/run/openvswitch/br-int.mgmt: 
connecting to switch
gtw06-logs: 
2023-09-20T11:17:35Z|00012|rconn|INFO|unix:/var/run/openvswitch/br-int.mgmt: 
connecting...
gtw06-logs: 
2023-09-20T11:17:35.592Z|00011|features|INFO|unix:/var/run/openvswitch/br-int.mgmt:
 connecting to switch
gtw06-logs: 
2023-09-20T11:17:35.593Z|00012|rconn|INFO|unix:/var/run/openvswitch/br-int.mgmt:
 connecting...
gtw06-logs: 2023-09-20T11:17:35.594Z|00016|main|INFO|OVS feature set changed, 
force recompute.
gtw06-logs: 
2023-09-20T11:17:35.594Z|00017|ofctrl|INFO|unix:/var/run/openvswitch/br-int.mgmt:
 connecting to switch
gtw06-logs: 2023-09-20T11:17:35Z|00014|features|INFO|OVS Feature: ct_zero_snat, 
state: supported
gtw06-logs: 
2023-09-20T11:17:35.594Z|00018|rconn|INFO|unix:/var/run/openvswitch/br-int.mgmt:
 connecting...
gtw06-logs: 2023-09-20T11:17:35Z|00015|features|INFO|OVS Feature: ct_flush, 
state: supported
gtw06-logs: 2023-09-20T11:17:35.594Z|00015|features|INFO|OVS Feature: ct_flush, 
state: supported
gtw06-logs: 2023-09-20T11:17:35Z|00016|main|INFO|OVS feature set changed, force 
recompute.
gtw06-logs: 
2023-09-20T11:17:35Z|00013|rconn|INFO|unix:/var/run/openvswitch/br-int.mgmt: 
connected
gtw06-logs: 
2023-09-20T11:17:35.594Z|00013|rconn|INFO|unix:/var/run/openvswitch/br-int.mgmt:
 connected
gtw06-logs: 
2023-09-20T11:17:35Z|00017|ofctrl|INFO|unix:/var/run/openvswitch/br-int.mgmt: 
connecting to switch
gtw06-logs: 2023-09-20T11:17:35.594Z|00014|features|INFO|OVS Feature: 
ct_zero_snat, state: supported
gtw06-logs: 
2023-09-20T11:17:35Z|00018|rconn|INFO|unix:/var/run/openvswitch/br-int.mgmt: 
connecting...
gtw06-logs: 2023-09-20T11:17:35Z|00044|poll_loop|INFO|wakeup due to [POLLIN] on 
fd 20 (<->/var/run/openvswitch/br-int.mgmt) at lib/stream-fd.c:157 (100% CPU 
usage)
gtw06-logs: 2023-09-20T11:17:35Z|00049|poll_loop|INFO|wakeup due to [POLLIN] on 
fd 21 (<->/var/run/openvswitch/br-int.mgmt) at lib/stream-fd.c:157 (100% CPU 
usage)
gtw06-logs: 2023-09-20T11:17:35.758Z|00019|timeval|WARN|Unreasonably long 
4469ms poll interval (4212ms user, 256ms system)
gtw06-logs: 2023-09-20T11:17:35Z|00021|timeval|WARN|context switches: 0 
voluntary, 12 involuntary
gtw06-logs: 2023-09-20T11:17:35Z|00045|poll_loop|INFO|wakeup due to [POLLIN] on 
fd 21 (<->/var/run/openvswitch/br-int.mgmt) at lib/stream-fd.c:157 (100% CPU 
usage)
gtw06-logs: 2023-09-20T11:17:35.758Z|00050|poll_loop|INFO|wakeup due to 
[POLLIN] on fd 21 (<->/var/run/openvswitch/br-int.mgmt) at lib/stream-fd.c:157 
(100% CPU usage)
gtw06-logs: 2023-09-20T11:17:35.758Z|00044|poll_loop|INFO|wakeup due to 
[POLLIN] on fd 20 (<->/var/run/openvswitch/br-int.mgmt) at lib/stream-fd.c:157 
(100% CPU usage)
gtw06-logs: 2023-09-20T11:17:35.758Z|00047|main|INFO|OVS OpenFlow connection 
reconnected,force recompute.
gtw06-logs: 2023-09-20T11:17:35Z|00047|main|INFO|OVS OpenFlow connection 
reconnected,force recompute.
gtw06-logs: 2023-09-20T11:17:35.758Z|00048|poll_loop|INFO|wakeup due to 0-ms 
timeout at controller/ovn-controller.c:5405 (100% CPU usage)
gtw06-logs: 
2023-09-20T11:17:35Z|00046|rconn|INFO|unix:/var/run/openvswitch/br-int.mgmt: 
connected
gtw06-logs: 2023-09-20T11:17:35Z|00048|poll_loop|INFO|wakeup due to 0-ms 
timeout at controller/ovn-controller.c:5405 (100% CPU usage)
gtw06-logs: 2023-09-20T11:17:35Z|00050|poll_loop|INFO|wakeup due to [POLLIN] on 
fd 21 (<->/var/run/openvswitch/br-int.mgmt) at lib/stream-fd.c:157 (100% CPU 
usage)
gtw06-logs: 2023-09-20T11:17:35.758Z|00049|poll_loop|INFO|wakeup due to 
[POLLIN] on fd 21 (<->/var/run/openvswitch/br-int.mgmt) at lib/stream-fd.c:157 
(100% CPU usage)
gtw06-logs: 2023-09-20T11:17:35Z|00019|timeval|WARN|Unreasonably long 4469ms 
poll interval (4212ms user, 256ms system)
gtw06-logs: 2023-09-20T11:17:35Z|00020|timeval|WARN|faults: 133750 minor, 0 
major
gtw06-logs: 2023-09-20T11:17:35.758Z|00045|poll_loop|INFO|wakeup due to 
[POLLIN] on fd 21 (<->/var/run/openvswitch/br-int.mgmt) at lib/stream-fd.c:157 
(100% CPU usage)
gtw06-logs: 2023-09-20T11:17:35.758Z|00020|timeval|WARN|faults: 133750 minor, 0 
major
gtw06-logs: 2023-09-20T11:17:35.758Z|00021|timeval|WARN|context switches: 0 
voluntary, 12 involuntary
gtw06-logs: 
2023-09-20T11:17:35.758Z|00046|rconn|INFO|unix:/var/run/openvswitch/br-int.mgmt:
 connected
gtw06-logs: 2023-09-20T11:17:35.759Z|00051|poll_loop|INFO|wakeup due to 0-ms 
timeout at controller/ofctrl.c:695 (100% CPU usage)
gtw06-logs: 2023-09-20T11:17:35Z|00051|poll_loop|INFO|wakeup due to 0-ms 
timeout at controller/ofctrl.c:695 (100% CPU usage)
gtw06-logs: 2023-09-20T11:17:35.766Z|00052|poll_loop|INFO|wakeup due to 
[POLLIN] on fd 12 (<->/run/openvswitch/db.sock) at lib/stream-fd.c:157 (100% 
CPU usage)
gtw06-logs: 2023-09-20T11:17:35Z|00052|poll_loop|INFO|wakeup due to [POLLIN] on 
fd 12 (<->/run/openvswitch/db.sock) at lib/stream-fd.c:157 (100% CPU usage)
gtw06-logs: 2023-09-20T11:17:35.768Z|00053|poll_loop|INFO|wakeup due to 
[POLLIN] on fd 12 (<->/run/openvswitch/db.sock) at lib/stream-fd.c:157 (100% 
CPU usage)
gtw06-logs: 2023-09-20T11:17:35Z|00053|poll_loop|INFO|wakeup due to [POLLIN] on 
fd 12 (<->/run/openvswitch/db.sock) at lib/stream-fd.c:157 (100% CPU usage)
gtw06-logs: 2023-09-20T11:17:35Z|00054|main|INFO|OVS feature set changed, force 
recompute.
gtw06-logs: 2023-09-20T11:17:35.773Z|00054|main|INFO|OVS feature set changed, 
force recompute.
gtw06-logs: 2023-09-20T11:17:35.887Z|00136|ovn_bfd|INFO|Enabled BFD on 
interface ovn-gtw10.-0 <-------------------------------------------- BFD 
session for gtw10 established
gtw06-logs: 2023-09-20T11:17:35Z|00136|ovn_bfd|INFO|Enabled BFD on interface 
ovn-gtw10.-0
gtw06-logs: 2023-09-20T11:17:35.888Z|00243|ovn_bfd|INFO|Enabled BFD on 
interface ovn-gtw17.-0
gtw06-logs: 2023-09-20T11:17:35Z|00243|ovn_bfd|INFO|Enabled BFD on interface 
ovn-gtw17.-0
gtw06-logs: 2023-09-20T11:17:35.889Z|00304|ovn_bfd|INFO|Enabled BFD on 
interface ovn-gtw05.-0
gtw06-logs: 2023-09-20T11:17:35Z|00304|ovn_bfd|INFO|Enabled BFD on interface 
ovn-gtw05.-0
gtw06-logs: 2023-09-20T11:17:35Z|00390|ovn_bfd|INFO|Enabled BFD on interface 
ovn-gtw12.-0
gtw06-logs: 2023-09-20T11:17:35Z|00394|ovn_bfd|INFO|Enabled BFD on interface 
ovn-gtw18.-0
gtw06-logs: 2023-09-20T11:17:35.891Z|00390|ovn_bfd|INFO|Enabled BFD on 
interface ovn-gtw12.-0
gtw06-logs: 2023-09-20T11:17:35.891Z|00394|ovn_bfd|INFO|Enabled BFD on 
interface ovn-gtw18.-0
gtw06-logs: 2023-09-20T11:17:35.892Z|00491|ovn_bfd|INFO|Enabled BFD on 
interface ovn-gtw11.-0
gtw06-logs: 2023-09-20T11:17:35Z|00485|ovn_bfd|INFO|Enabled BFD on interface 
ovn-gtw16.-0
gtw06-logs: 2023-09-20T11:17:35.892Z|00485|ovn_bfd|INFO|Enabled BFD on 
interface ovn-gtw16.-0
gtw06-logs: 2023-09-20T11:17:35Z|00491|ovn_bfd|INFO|Enabled BFD on interface 
ovn-gtw11.-0
gtw06-logs: 2023-09-20T11:17:35.893Z|00539|ovn_bfd|INFO|Enabled BFD on 
interface ovn-gtw04.-0
gtw06-logs: 2023-09-20T11:17:35Z|00539|ovn_bfd|INFO|Enabled BFD on interface 
ovn-gtw04.-0
gtw06-logs: 2023-09-20T11:17:35.898Z|00588|poll_loop|INFO|wakeup due to 
[POLLIN] on fd 19 (x.x.x.x:34220<->x.x.x.x:6642) at lib/stream-ssl.c:842 (100% 
CPU usage)
gtw06-logs: 2023-09-20T11:17:35Z|00587|poll_loop|INFO|wakeup due to [POLLIN] on 
fd 12 (<->/run/openvswitch/db.sock) at lib/stream-fd.c:157 (100% CPU usage)
gtw06-logs: 2023-09-20T11:17:35Z|00588|poll_loop|INFO|wakeup due to [POLLIN] on 
fd 19 (x.x.x.x:34220<->x.x.x.x:6642) at lib/stream-ssl.c:842 (100% CPU usage)
gtw06-logs: 2023-09-20T11:17:35.898Z|00587|poll_loop|INFO|wakeup due to 
[POLLIN] on fd 12 (<->/run/openvswitch/db.sock) at lib/stream-fd.c:157 (100% 
CPU usage)
gtw06-logs: 2023-09-20T11:17:42Z|00589|inc_proc_eng|INFO|node: 
logical_flow_output, recompute (forced) took 6915ms
*gtw10-logs: 2023-09-20T11:17:42Z|120750|binding|INFO|Releasing lport 
cr-lrp-ae2fe844-7b40-49c1-af1c-128030fc7dc8 from this chassis (sb_readonly=0) 
<----------------------- LRP released by gtw10
*gtw10-logs: 2023-09-20T11:17:42.522Z|120750|binding|INFO|Releasing lport 
cr-lrp-ae2fe844-7b40-49c1-af1c-128030fc7dc8 from this chassis (sb_readonly=0)
gtw06-logs: 2023-09-20T11:17:42.928Z|00589|inc_proc_eng|INFO|node: 
logical_flow_output, recompute (forced) took 6915ms
gtw06-logs: 
2023-09-20T11:17:43Z|00001|pinctrl(ovn_pinctrl0)|INFO|unix:/var/run/openvswitch/br-int.mgmt:
 connecting to switch
gtw06-logs: 
2023-09-20T11:17:43.128Z|00002|rconn(ovn_pinctrl0)|INFO|unix:/var/run/openvswitch/br-int.mgmt:
 connecting...
gtw06-logs: 
2023-09-20T11:17:43Z|00002|rconn(ovn_pinctrl0)|INFO|unix:/var/run/openvswitch/br-int.mgmt:
 connecting...
gtw06-logs: 
2023-09-20T11:17:43.128Z|00001|pinctrl(ovn_pinctrl0)|INFO|unix:/var/run/openvswitch/br-int.mgmt:
 connecting to switch
gtw06-logs: 
2023-09-20T11:17:43.129Z|00003|rconn(ovn_pinctrl0)|INFO|unix:/var/run/openvswitch/br-int.mgmt:
 connected
gtw06-logs: 
2023-09-20T11:17:43Z|00003|rconn(ovn_pinctrl0)|INFO|unix:/var/run/openvswitch/br-int.mgmt:
 connected
gtw06-logs: 2023-09-20T11:17:44Z|00591|timeval|WARN|faults: 267615 minor, 0 
major
gtw06-logs: 2023-09-20T11:17:44Z|00590|timeval|WARN|Unreasonably long 8474ms 
poll interval (7465ms user, 994ms system)
gtw06-logs: 2023-09-20T11:17:44Z|00592|timeval|WARN|context switches: 0 
voluntary, 1589 involuntary
gtw06-logs: 2023-09-20T11:17:44Z|00623|poll_loop|INFO|Dropped 1 log messages in 
last 9 seconds (most recently, 9 seconds ago) due to excessive rate
gtw06-logs: 2023-09-20T11:17:44.372Z|00590|timeval|WARN|Unreasonably long 
8474ms poll interval (7465ms user, 994ms system)
gtw06-logs: 2023-09-20T11:17:44.372Z|00591|timeval|WARN|faults: 267615 minor, 0 
major
gtw06-logs: 2023-09-20T11:17:44.372Z|00592|timeval|WARN|context switches: 0 
voluntary, 1589 involuntary
gtw06-logs: 2023-09-20T11:17:44Z|00624|poll_loop|INFO|wakeup due to 
[POLLIN][POLLHUP] on fd 26 (/var/run/ovn/ovn-controller.1.ctl<->) at 
lib/stream-fd.c:157 (99% CPU usage)
gtw06-logs: 2023-09-20T11:17:44.373Z|00624|poll_loop|INFO|wakeup due to 
[POLLIN][POLLHUP] on fd 26 (/var/run/ovn/ovn-controller.1.ctl<->) at 
lib/stream-fd.c:157 (99% CPU usage)
gtw06-logs: 2023-09-20T11:17:44.372Z|00623|poll_loop|INFO|Dropped 1 log 
messages in last 9 seconds (most recently, 9 seconds ago) due to excessive rate
gtw06-logs: 
2023-09-20T11:17:44Z|00626|memory|INFO|idl-cells-OVN_Southbound:4535964 
idl-cells-Open_vSwitch:33840 idl-outstanding-txns-Open_vSwitch:1 
lflow-cache-entries-cache-expr:118625 lflow-cache-entries-cache-matches:47631 
lflow-cache-size-KB:363984 local_datapath_usage-KB:659 
ofctrl_desired_flow_usage-KB:174398 ofctrl_installed_flow_usage-KB:129580 
ofctrl_rconn_packet_counter-KB:119153 ofctrl_sb_flow_ref_usage-KB:62968 
oflow_update_usage-KB:1
gtw06-logs: 2023-09-20T11:17:44.373Z|00625|memory|INFO|peak resident set size 
grew 2377% in last 15.1 seconds, from 98216 kB to 2432756 kB
gtw06-logs: 
2023-09-20T11:17:44.373Z|00626|memory|INFO|idl-cells-OVN_Southbound:4535964 
idl-cells-Open_vSwitch:33840 idl-outstanding-txns-Open_vSwitch:1 
lflow-cache-entries-cache-expr:118625 lflow-cache-entries-cache-matches:47631 
lflow-cache-size-KB:363984 local_datapath_usage-KB:659 
ofctrl_desired_flow_usage-KB:174398 ofctrl_installed_flow_usage-KB:129580 
ofctrl_rconn_packet_counter-KB:119153 ofctrl_sb_flow_ref_usage-KB:62968 
oflow_update_usage-KB:1
gtw06-logs: 2023-09-20T11:17:44Z|00625|memory|INFO|peak resident set size grew 
2377% in last 15.1 seconds, from 98216 kB to 2432756 kB
gtw06-logs: 2023-09-20T11:18:10.745Z|00627|poll_loop|INFO|Dropped 14 log 
messages in last 26 seconds (most recently, 25 seconds ago) due to excessive 
rate
gtw06-logs: 2023-09-20T11:18:10Z|00628|poll_loop|INFO|wakeup due to [POLLIN] on 
fd 19 (x.x.x.x:34220<->x.x.x.x:6642) at lib/stream-ssl.c:842 (52% CPU usage)
gtw06-logs: 2023-09-20T11:18:10Z|00627|poll_loop|INFO|Dropped 14 log messages 
in last 26 seconds (most recently, 25 seconds ago) due to excessive rate
gtw06-logs: 2023-09-20T11:18:10.745Z|00628|poll_loop|INFO|wakeup due to 
[POLLIN] on fd 19 (x.x.x.x:34220<->x.x.x.x:6642) at lib/stream-ssl.c:842 (52% 
CPU usage)
gtw06-logs: 2023-09-20T11:18:10.855Z|00629|poll_loop|INFO|wakeup due to 0-ms 
timeout at controller/ovn-controller.c:5405 (52% CPU usage)
gtw06-logs: 2023-09-20T11:18:10Z|00629|poll_loop|INFO|wakeup due to 0-ms 
timeout at controller/ovn-controller.c:5405 (52% CPU usage)
gtw06-logs: 2023-09-20T11:18:11.998Z|00630|poll_loop|INFO|wakeup due to 
[POLLIN] on fd 19 (x.x.x.x:34220<->x.x.x.x:6642) at lib/stream-ssl.c:842 (52% 
CPU usage)
gtw06-logs: 2023-09-20T11:18:11Z|00630|poll_loop|INFO|wakeup due to [POLLIN] on 
fd 19 (x.x.x.x:34220<->x.x.x.x:6642) at lib/stream-ssl.c:842 (52% CPU usage)
gtw06-logs: 2023-09-20T11:18:12Z|00631|poll_loop|INFO|wakeup due to [POLLIN] on 
fd 19 (x.x.x.x:34220<->x.x.x.x:6642) at lib/stream-ssl.c:842 (52% CPU usage)
gtw06-logs: 2023-09-20T11:18:12.111Z|00631|poll_loop|INFO|wakeup due to 
[POLLIN] on fd 19 (x.x.x.x:34220<->x.x.x.x:6642) at lib/stream-ssl.c:842 (52% 
CPU usage)
gtw06-logs: 2023-09-20T11:18:12.222Z|00632|poll_loop|INFO|wakeup due to 0-ms 
timeout at controller/ovn-controller.c:5405 (52% CPU usage)
gtw06-logs: 2023-09-20T11:18:12Z|00632|poll_loop|INFO|wakeup due to 0-ms 
timeout at controller/ovn-controller.c:5405 (52% CPU usage)
gtw06-logs: 2023-09-20T11:18:39.410Z|00633|memory_trim|INFO|Detected inactivity 
(last active 30114 ms ago): trimming memory
gtw06-logs: 2023-09-20T11:18:39Z|00633|memory_trim|INFO|Detected inactivity 
(last active 30114 ms ago): trimming memory
gtw06-logs: 2023-09-20T11:19:27.882Z|00634|memory_trim|INFO|Detected inactivity 
(last active 30112 ms ago): trimming memory
gtw06-logs: 2023-09-20T11:19:27Z|00634|memory_trim|INFO|Detected inactivity 
(last active 30112 ms ago): trimming memory
gtw06-logs: 2023-09-20T11:20:15.878Z|00635|memory_trim|INFO|Detected inactivity 
(last active 30115 ms ago): trimming memory
gtw06-logs: 2023-09-20T11:20:15Z|00635|memory_trim|INFO|Detected inactivity 
(last active 30115 ms ago): trimming memory
gtw06-logs: 2023-09-20T11:21:02Z|00636|memory_trim|INFO|Detected inactivity 
(last active 30116 ms ago): trimming memory
gtw06-logs: 2023-09-20T11:21:02.338Z|00636|memory_trim|INFO|Detected inactivity 
(last active 30116 ms ago): trimming memory
...

```


To avoid the above issue the following patch introduces a cache to
keep track of LRPs that are claimed by the current chassis and
release it *ONLY* if an update event for LRP claim comes in via
Southbound from another GTW Chassis.

With this patch, for a short instance, the LRP would be claimed by
2 GTWs but that IMO is better then having none at all(in current
version).

The patch is still a draft(thus the mail to this list rather then
ovs-dev), as I am working on fixing test cases but any feedback on
it is appreciated.

Kind regards,
Ihtisham ul Haq

-----------------------------------------------------------------------

diff --git a/controller/binding.c b/controller/binding.c
index 3613a0112..57dcebe42 100644
--- a/controller/binding.c
+++ b/controller/binding.c
@@ -57,6 +57,59 @@ struct claimed_port {
 static struct shash _claimed_ports = SHASH_INITIALIZER(&_claimed_ports);
 static struct sset _postponed_ports = SSET_INITIALIZER(&_postponed_ports);

+
+void add_lrp_claim(struct binding_ctx_out *ctx, const char *lrp_id) {
+    if (find_lrp_claim(ctx, lrp_id)) {
+        return;
+    }
+
+    struct lrp_claim *new_claim = xmalloc(sizeof *new_claim);
+    new_claim->lrp_id = xstrdup(lrp_id);
+    new_claim->next = ctx->lrp_claims->claim;
+    ctx->lrp_claims->claim = new_claim;
+
+    ctx->lrp_claims->size++;
+}
+
+void remove_lrp_claim(struct binding_ctx_out *ctx, char *lrp_id) {
+    struct lrp_claim *prev = NULL;
+    struct lrp_claim *current = ctx->lrp_claims->claim;
+
+    while (current) {
+        if (current->lrp_id && strcmp(current->lrp_id, lrp_id) == 0) {
+            if (prev) {
+                prev->next = current->next;
+            } else {
+                ctx->lrp_claims->claim = current->next;
+            }
+
+            free(current->lrp_id);
+            free(current);
+
+            ctx->lrp_claims->size--;
+
+            return;
+        }
+
+        prev = current;
+        current = current->next;
+    }
+}
+
+bool find_lrp_claim(struct binding_ctx_out *ctx, const char *lrp_id) {
+    if (!ctx->lrp_claims || !lrp_id) {
+        return false;
+    }
+    struct lrp_claim *current = ctx->lrp_claims->claim;
+    while (current) {
+        if (current->lrp_id && strcmp(current->lrp_id, lrp_id) == 0) {
+            return true;
+        }
+        current = current->next;
+    }
+    return false;
+}
+
 static void
 remove_additional_chassis(const struct sbrec_port_binding *pb,
                           const struct sbrec_chassis *chassis_rec);
@@ -1798,7 +1851,8 @@ static bool
 consider_nonvif_lport_(const struct sbrec_port_binding *pb,
                        bool our_chassis,
                        struct binding_ctx_in *b_ctx_in,
-                       struct binding_ctx_out *b_ctx_out)
+                       struct binding_ctx_out *b_ctx_out,
+                       bool should_release)
 {
     if (our_chassis) {
         update_local_lports(pb->logical_port, b_ctx_out);
@@ -1808,8 +1862,8 @@ consider_nonvif_lport_(const struct sbrec_port_binding 
*pb,
                            pb->datapath, b_ctx_in->chassis_rec,
                            b_ctx_out->local_datapaths,
                            b_ctx_out->tracked_dp_bindings);
-
         update_related_lport(pb, b_ctx_out);
+        add_lrp_claim(b_ctx_out, pb->logical_port);
         return claim_lport(pb, NULL, b_ctx_in->chassis_rec, NULL,
                            !b_ctx_in->ovnsb_idl_txn, false,
                            b_ctx_out->tracked_dp_bindings,
@@ -1817,8 +1871,8 @@ consider_nonvif_lport_(const struct sbrec_port_binding 
*pb,
                            b_ctx_out->postponed_ports);
     }

-    if (pb->chassis == b_ctx_in->chassis_rec ||
-            is_additional_chassis(pb, b_ctx_in->chassis_rec)) {
+    if (should_release) {
+        remove_lrp_claim(b_ctx_out, pb->logical_port);
         return release_lport(pb, b_ctx_in->chassis_rec,
                              !b_ctx_in->ovnsb_idl_txn,
                              b_ctx_out->tracked_dp_bindings,
@@ -1837,7 +1891,7 @@ consider_l2gw_lport(const struct sbrec_port_binding *pb,
     bool our_chassis = chassis_id && !strcmp(chassis_id,
                                              b_ctx_in->chassis_rec->name);

-    return consider_nonvif_lport_(pb, our_chassis, b_ctx_in, b_ctx_out);
+    return consider_nonvif_lport_(pb, our_chassis, b_ctx_in, b_ctx_out, true);
 }

 static bool
@@ -1849,7 +1903,7 @@ consider_l3gw_lport(const struct sbrec_port_binding *pb,
     bool our_chassis = chassis_id && !strcmp(chassis_id,
                                              b_ctx_in->chassis_rec->name);

-    return consider_nonvif_lport_(pb, our_chassis, b_ctx_in, b_ctx_out);
+    return consider_nonvif_lport_(pb, our_chassis, b_ctx_in, b_ctx_out, true);
 }

 static void
@@ -1882,6 +1936,11 @@ consider_ha_lport(const struct sbrec_port_binding *pb,
                                              b_ctx_in->active_tunnels,
                                              b_ctx_in->chassis_rec);

+    bool should_release = false;
+    if (pb && pb->chassis){
+        should_release = find_lrp_claim(b_ctx_out, pb->logical_port) && 
strcmp(pb->chassis->hostname, b_ctx_in->chassis_rec->hostname)  && !our_chassis;
+    }
+
     if (is_ha_chassis && !our_chassis) {
         /* If the chassis_rec is part of ha_chassis_group associated with
          * the port_binding 'pb', we need to add to the local_datapath
@@ -1899,7 +1958,8 @@ consider_ha_lport(const struct sbrec_port_binding *pb,
         update_related_lport(pb, b_ctx_out);
     }

-    return consider_nonvif_lport_(pb, our_chassis, b_ctx_in, b_ctx_out);
+    return consider_nonvif_lport_(pb, our_chassis, b_ctx_in, b_ctx_out,
+                                  should_release);
 }

 static bool
diff --git a/controller/binding.h b/controller/binding.h
index 41b311029..f28cc0d5f 100644
--- a/controller/binding.h
+++ b/controller/binding.h
@@ -72,6 +72,16 @@ struct related_lports {
 void related_lports_init(struct related_lports *);
 void related_lports_destroy(struct related_lports *);

+struct lrp_claim {
+    char *lrp_id;
+    struct lrp_claim *next;
+};
+
+struct lrp_claim_list {
+    size_t size;
+    struct lrp_claim *claim;
+};
+
 struct binding_ctx_out {
     struct hmap *local_datapaths;
     struct shash *local_active_ports_ipv6_pd;
@@ -106,8 +116,17 @@ struct binding_ctx_out {
     struct if_status_mgr *if_mgr;

     struct sset *postponed_ports;
+
+    struct lrp_claim_list *lrp_claims;
 };

+
+void add_lrp_claim(struct binding_ctx_out *ctx, const char *lrp_id);
+
+void remove_lrp_claim(struct binding_ctx_out *ctx, char *lrp_id);
+
+bool find_lrp_claim(struct binding_ctx_out *ctx, const char *lrp_id);
+
 /* Local bindings. binding.c module binds the logical port (represented by
  * Port_Binding rows) and sets the 'chassis' column when it sees the
  * OVS interface row (of type "" or "internal") with the
diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
index 9619c698a..cd53d5e77 100644
--- a/controller/ovn-controller.c
+++ b/controller/ovn-controller.c
@@ -1335,6 +1335,50 @@ en_postponed_ports_run(struct engine_node *node, void 
*data_)
     engine_set_node_state(node, state);
 }

+struct ed_type_lrp_claims {
+    struct lrp_claim_list *lrp_claims;
+};
+
+static void *
+en_lrp_claims_init(struct engine_node *node OVS_UNUSED, struct engine_arg *arg 
OVS_UNUSED)
+{
+    struct ed_type_lrp_claims *data = xzalloc(sizeof *data);
+    data->lrp_claims = xzalloc(sizeof *(data->lrp_claims));
+
+    data->lrp_claims->claim = NULL;
+    data->lrp_claims->size = 0;
+
+    return data;
+}
+
+static void
+en_lrp_claims_cleanup(void *data_)
+{
+    struct ed_type_lrp_claims *data = data_;
+    struct lrp_claim *current = data->lrp_claims->claim;
+    while (current) {
+        struct lrp_claim *next = current->next;
+        free(current->lrp_id);
+        free(current);
+        current = next;
+    }
+    data->lrp_claims->claim = NULL;
+    data->lrp_claims->size = 0;
+}
+
+static void
+en_lrp_claims_run(struct engine_node *node, void *data_)
+{
+    struct ed_type_lrp_claims *data = data_;
+    enum engine_node_state state = EN_UNCHANGED;
+
+    if (data->lrp_claims!=NULL) {
+        state = EN_UPDATED;
+    }
+
+    engine_set_node_state(node, state);
+}
+
 struct ed_type_runtime_data {
     /* Contains "struct local_datapath" nodes. */
     struct hmap local_datapaths;
@@ -1367,6 +1411,8 @@ struct ed_type_runtime_data {
     struct shash local_active_ports_ras;

     struct sset *postponed_ports;
+
+    struct lrp_claim_list *lrp_claims;
 };

 /* struct ed_type_runtime_data has the below members for tracking the
@@ -1477,6 +1523,17 @@ en_runtime_data_cleanup(void *data)
     shash_destroy(&rt_data->local_active_ports_ipv6_pd);
     shash_destroy(&rt_data->local_active_ports_ras);
     local_binding_data_destroy(&rt_data->lbinding_data);
+    struct lrp_claim *claim = rt_data->lrp_claims->claim;
+    while (claim) {
+        struct lrp_claim *next_claim = claim->next;
+        free(claim->lrp_id);
+        free(claim);
+        claim = next_claim;
+    }
+    struct lrp_claim_list *claims = rt_data->lrp_claims;
+    free(claims->claim);
+    free(claims->size);
+    free(claims);
 }

 static void
@@ -1565,6 +1622,7 @@ init_binding_ctx(struct engine_node *node,
     b_ctx_out->postponed_ports = rt_data->postponed_ports;
     b_ctx_out->tracked_dp_bindings = NULL;
     b_ctx_out->if_mgr = ctrl_ctx->if_mgr;
+    b_ctx_out->lrp_claims = rt_data->lrp_claims;
 }

 static void
@@ -1603,6 +1661,9 @@ en_runtime_data_run(struct engine_node *node, void *data)
     struct ed_type_postponed_ports *pp_data =
         engine_get_input_data("postponed_ports", node);
     rt_data->postponed_ports = pp_data->postponed_ports;
+    struct ed_type_lrp_claims *lrp_claims_data =
+        engine_get_input_data("lrp_claims", node);
+    rt_data->lrp_claims = lrp_claims_data->lrp_claims;

     struct binding_ctx_in b_ctx_in;
     struct binding_ctx_out b_ctx_out;
@@ -4212,12 +4273,13 @@ main(int argc, char *argv[])
     ENGINE_NODE_WITH_CLEAR_TRACK_DATA_IS_VALID(ct_zones, "ct_zones");
     ENGINE_NODE_WITH_CLEAR_TRACK_DATA(ovs_interface_shadow,
                                       "ovs_interface_shadow");
-    ENGINE_NODE_WITH_CLEAR_TRACK_DATA(runtime_data, "runtime_data");
+    ENGINE_NODE(runtime_data, "runtime_data");
     ENGINE_NODE(non_vif_data, "non_vif_data");
     ENGINE_NODE(mff_ovn_geneve, "mff_ovn_geneve");
     ENGINE_NODE(ofctrl_is_connected, "ofctrl_is_connected");
     ENGINE_NODE_WITH_CLEAR_TRACK_DATA(activated_ports, "activated_ports");
     ENGINE_NODE(postponed_ports, "postponed_ports");
+    ENGINE_NODE(lrp_claims, "lrp_claims");
     ENGINE_NODE(pflow_output, "physical_flow_output");
     ENGINE_NODE_WITH_CLEAR_TRACK_DATA(lflow_output, "logical_flow_output");
     ENGINE_NODE(flow_output, "flow_output");
@@ -4377,6 +4439,8 @@ main(int argc, char *argv[])
     /* Reuse the same handler for any previously postponed ports. */
     engine_add_input(&en_runtime_data, &en_postponed_ports,
                      runtime_data_sb_port_binding_handler);
+    engine_add_input(&en_runtime_data, &en_lrp_claims,
+                     runtime_data_sb_port_binding_handler);
     /* Run sb_ro_handler after port_binding_handler in case port get deleted */
     engine_add_input(&en_runtime_data, &en_sb_ro, runtime_data_sb_ro_handler);


-----------------------------------------------------------------------


Diese E Mail enthält möglicherweise vertrauliche Inhalte und ist nur für die 
Verwertung durch den vorgesehenen Empfänger bestimmt.
Sollten Sie nicht der vorgesehene Empfänger sein, setzen Sie den Absender bitte 
unverzüglich in Kenntnis und löschen diese E Mail.

Hinweise zum Datenschutz finden Sie hier<https://www.datenschutz.schwarz>.


This e-mail may contain confidential content and is intended only for the 
specified recipient/s.
If you are not the intended recipient, please inform the sender immediately and 
delete this e-mail.

Information on data protection can be found 
here<https://www.datenschutz.schwarz>.
_______________________________________________
discuss mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to