Hi Bo Zhao,
Thank you for reporting this issue.

Actually, according to the way you are describing the issue, I only suspect
that your affected controller/chassis may not have proper
 connectivity with the SBDB or someone else answering your DNS request.

If you see the way ovn-controller implements the DNS support you can see
that ovn-controller will copy all the DNS records AS IS from the SBDB
record to the local DNS
cache so if your VM gets an empty DNS answer that can be some other DNS
provider answering your request or the ovn-controller has some issue
storing the record from SBDB correctly,
To validate that these requests are answered by the affected ovn controller
and not some other DNS provider can you please check the OFtable 32/33
counters and see if those counters increase when sending a DNS request?
can you please enable DBG mode on the affected controller logs and see if
you have any error messages?


On Thu, Mar 21, 2024 at 8:59 PM Bo Zhao via discuss <
ovs-discuss@openvswitch.org> wrote:

> Hi team,
>
> Recently, we face a weird issue on DNS resolv using OVN.
>
> Case describe:
> 1. There is a VM instance or a VM cluster(3 VMs)on our deployment for a
> long time, and we enable the DNS via neutron-ovn.
> 2. The DNS resolve works as expected in our internal network for a while.
> 3. Suddentlly, the DNS resolve failed  on the said VM or one of the
> cluster nodes.
>
> Notes:
> a. I had confirmed that the SouthBound DB contains the said DNS records
> for a long time. And we didn't change anything before the issue happened.
> b. For the cluster nodes(3 VMs), all VMs locate on different compute
> nodes(Chassis). But there is only 1 VM DNS resolution failure in our case.
> c. I traced the logic flow, and tcpdump the DNS traffic. That's true the
> DNS resp is generating by OVN and get 0 record which worked well before.
>
> How to resolve:
> Trigger the whole local ovn-controller to refresh the DNS records on its
> DNS local cache. What we found is live-migration of the error VM.
>
> Our question is:
> 1. Why DNS resolution failed on the local ovn-controller of compute node
> Chassis?
>    a. Did the DNS local cache fail to sync with SouthboundDB?
>

There is no straight way to dump the local DNS cache from the
ovn-controller but you can check the Openflow flows in table 32/33 and see
that you hit those flows when the VM sends a DNS request especially table
33 which will at least indicate that your local DNS cache is not empty.

   b. The DNS local cache MEM size is not limit, right?
>
cache MEM size is not limited.

   c. How to trace the DNS local cache on a running ovn-controller? I
> didn't find any CLI interface for it.
>


>    d. Is there any suggestion for avoiding this issue? I'm failed to find
> any usable config opts for ovn-controller. If I miss some config options,
> please leave your kind suggestion.
>
I'm not sure about the root cause of the issue, can you please attach your
SB/NB DBs and controller logs so that we can figure out why this issue
happened?


> 2. Is there other scenario that might raise DNS resolution failure with no
> change? I mean from the scale deployment perspective and just
> maintenance the existing DNS.
>

ovn DNS resolution can fail or take longer when the user defines ipv4
record only or ipv6 record only because ovn will answer the DNS request
that belongs to this record with ipv4 or ipv6 only and if the requester
sends two requests one for ipv4 and one for ipv6 the ovn will answer one
request and let the other request continue through the pipeline and reach
another DNS provider if exist or dropped.





> Thanks
>
> Best Regards,
>
> Bo Zhao
> _______________________________________________
> discuss mailing list
> disc...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>
_______________________________________________
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to