Hi team,

Recently, we face a weird issue on DNS resolv using OVN.

Case describe:
1. There is a VM instance or a VM cluster(3 VMs)on our deployment for a
long time, and we enable the DNS via neutron-ovn.
2. The DNS resolve works as expected in our internal network for a while.
3. Suddentlly, the DNS resolve failed  on the said VM or one of the cluster
nodes.

Notes:
a. I had confirmed that the SouthBound DB contains the said DNS records for
a long time. And we didn't change anything before the issue happened.
b. For the cluster nodes(3 VMs), all VMs locate on different compute
nodes(Chassis). But there is only 1 VM DNS resolution failure in our case.
c. I traced the logic flow, and tcpdump the DNS traffic. That's true the
DNS resp is generating by OVN and get 0 record which worked well before.

How to resolve:
Trigger the whole local ovn-controller to refresh the DNS records on its
DNS local cache. What we found is live-migration of the error VM.

Our question is:
1. Why DNS resolution failed on the local ovn-controller of compute node
Chassis?
   a. Did the DNS local cache fail to sync with SouthboundDB?
   b. The DNS local cache MEM size is not limit, right?
   c. How to trace the DNS local cache on a running ovn-controller? I
didn't find any CLI interface for it.
   d. Is there any suggestion for avoiding this issue? I'm failed to find
any usable config opts for ovn-controller. If I miss some config options,
please leave your kind suggestion.
2. Is there other scenario that might raise DNS resolution failure with no
change? I mean from the scale deployment perspective and just
maintenance the existing DNS.

Thanks

Best Regards,

Bo Zhao
_______________________________________________
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to