@Numan Siddique If we enable conditional monitoring here , will this help ? How does transport zones help with something like this ? Do they limit the amount of processing . We only have a single VM on this node , so single LSP , Logical Switch etc that is actually needed or used. Would v24.03.0 https://github.com/ovn-org/ovn/commit/1622526ff2102525e1bbf2ca262842c71d6b9b33 help here ?
Gav On Wed, 8 May 2024 at 14:43, Gavin McKee <gavmcke...@googlemail.com> wrote: > > Ok so > > 1. Customers depend on the internal DNS reccords, so this is needed > for production operations > 2. I can take a look at the updates - would using conditional > monitoring work here? We have ovn-monitor-all=true , would this help > at all ? > 3 & 4 . Is that something the community can help with? Is that a > viable long term fix we could maybe get a patch for ? > > Gav > > On Wed, 8 May 2024 at 14:30, Numan Siddique <num...@ovn.org> wrote: > > > > On Wed, May 8, 2024 at 3:20 PM Gavin McKee via discuss > > <ovs-discuss@openvswitch.org> wrote: > > > > > > Hi, > > > > > > Can someone help me understand why this issue occurs > > > > > > > > > ovn-controller 23.09.1 > > > Open vSwitch Library 3.2.2 > > > > > > We have an issue with some machines intermittently unable to resolve > > > DNS for external domains (example dig +noall +answer > > > harmonic-openai-canada.openai.azure.com) > > > > > > In the OVN controller log I see the following > > > > > > 2024-05-08T12:12:35.596Z|30138|inc_proc_eng|INFO|node: > > > logical_flow_output, recompute (missing handler for input SB_dns) took > > > 7950ms > > > 2024-05-08T14:50:29.747Z|30312|inc_proc_eng|INFO|node: > > > logical_flow_output, recompute (missing handler for input SB_dns) took > > > 8634ms > > > 2024-05-08T14:50:46.673Z|30329|inc_proc_eng|INFO|node: > > > logical_flow_output, recompute (missing handler for input SB_dns) took > > > 8774ms > > > 2024-05-08T14:54:40.781Z|30353|inc_proc_eng|INFO|node: > > > logical_flow_output, recompute (missing handler for input SB_dns) took > > > 8535ms > > > 2024-05-08T14:58:43.381Z|30433|inc_proc_eng|INFO|node: > > > logical_flow_output, recompute (missing handler for input SB_dns) took > > > 8541ms > > > 2024-05-08T14:58:56.802Z|30488|inc_proc_eng|INFO|node: > > > logical_flow_output, recompute (missing handler for input SB_dns) took > > > 8820ms > > > 2024-05-08T15:02:50.704Z|30512|inc_proc_eng|INFO|node: > > > logical_flow_output, recompute (missing handler for input SB_dns) took > > > 8739ms > > > 2024-05-08T15:03:05.206Z|30529|inc_proc_eng|INFO|node: > > > logical_flow_output, recompute (missing handler for input SB_dns) took > > > 8686ms > > > 2024-05-08T15:08:39.441Z|30569|inc_proc_eng|INFO|node: > > > logical_flow_output, recompute (missing handler for input SB_dns) took > > > 9167ms > > > 2024-05-08T15:09:09.152Z|30603|inc_proc_eng|INFO|node: > > > logical_flow_output, recompute (missing handler for input SB_dns) took > > > 8985ms > > > 2024-05-08T15:12:14.361Z|30632|inc_proc_eng|INFO|node: > > > logical_flow_output, recompute (missing handler for input SB_dns) took > > > 8569ms > > > 2024-05-08T15:13:52.535Z|30705|inc_proc_eng|INFO|node: > > > logical_flow_output, recompute (missing handler for input SB_dns) took > > > 8764ms > > > 2024-05-08T15:14:53.989Z|30732|inc_proc_eng|INFO|node: > > > logical_flow_output, recompute (missing handler for input SB_dns) took > > > 8802ms > > > 2024-05-08T15:16:30.911Z|30757|inc_proc_eng|INFO|node: > > > logical_flow_output, recompute (missing handler for input SB_dns) took > > > 8776ms > > > 2024-05-08T15:17:09.371Z|30784|inc_proc_eng|INFO|node: > > > logical_flow_output, recompute (missing handler for input SB_dns) took > > > 9062ms > > > > > > Why would this happen and is there something I can do about it ? Are > > > there more logs needed ? > > > > This indicates that your deployment is creating, updating or deleting a DNS > > row > > in the Northbound database and in turn ovn-northd is updating the SB DNS > > rows. > > When ovn-controller receives the Southbound DNS updates, it falls back > > to a full recompute > > because we are not handling these changes incrementally. Since OVN > > native DNS is configured > > in your deployment, each DNS packet is sent to ovn-controller for lookup. > > Even though a separate pinctrl thread handles packet-ins, dns > > handling is blocked until > > the main ovn-controller thread releases a mutex [1]. > > > > There are a few ways to resolve this > > > > 1. Disable native OVN DNS if you're not using this feature. To > > disable, don't create any DNS records in the OVN Northbound db. > > 2. Investigate why your deployment is updating the NB DBS table and > > avoid it if its not required. > > 3. Implement a handler for SB DNS so that ovn-controller do not fall > > back to a full recompute > > 4. Avoid locking on the mutex for DNS handling in pinctrl thread [1]. > > > > (3) or (4) requires code changes. > > > > Thanks > > Numan > > > > [1] - https://github.com/ovn-org/ovn/blob/main/controller/pinctrl.c#L3807 > > > > > > > > > > Gav > > > _______________________________________________ > > > discuss mailing list > > > disc...@openvswitch.org > > > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss > > > _______________________________________________ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss