Numan, Thanks so much for taking the time to look into this . I will build this and test it tomorrow .
Gav On Wed, 9 Oct 2024 at 17:06, Numan Siddique <num...@ovn.org> wrote: > > On Tue, Oct 8, 2024 at 12:37 AM Gavin McKee via discuss > <ovs-discuss@openvswitch.org> wrote: > > > > Awesome . Thanks for taking a look at this . > > Hi Gavin, > > I've posted a patch to address this - > https://patchwork.ozlabs.org/project/ovn/patch/20241010000239.1057750-1-num...@ovn.org/ > > It would be great if you can test it out and see if it addresses your > issues. The same can be found here too - > https://github.com/numansiddique/ovn/tree/sbdns_ip/v1 > > Thanks > Numan > > > > > Gav > > > > On Mon, Oct 7, 2024 at 18:46 Numan Siddique <num...@ovn.org> wrote: > >> > >> On Mon, Oct 7, 2024 at 1:33 PM Gavin McKee via discuss > >> <ovs-discuss@openvswitch.org> wrote: > >> > > >> > Hi Numan, > >> > > >> > Yes, DNS entries can be added / moved / updated frequently in our > >> > setup . Our usebase can create or delete VMs in bulk using Terraform > >> > so it's likely that we should handle a scenario where these actions > >> > don't block DNS lookups. > >> > > >> > I'm in deeper water than usual here so would appreciate it if you > >> > could make a proposal for the dns_cache_mutex . I'm fully open to any > >> > approach that will ensure that we keep processing DNS requests > >> > (especially the external) as blocking these is a big problem. > >> > >> I've some ideas. I'll see if I can spin up a patch. > >> > >> Thanks > >> Numan > >> > >> > > >> > Gav > >> > > >> > On Mon, 7 Oct 2024 at 10:07, Numan Siddique <num...@ovn.org> wrote: > >> > > > >> > > On Mon, Oct 7, 2024 at 12:13 PM Gavin McKee via discuss > >> > > <ovs-discuss@openvswitch.org> wrote: > >> > > > > >> > > > Numan, > >> > > > > >> > > > Hopefully I don't embarrass myself too much here, but here goes :) . > >> > > > If we implemented a separate thread for performing DNS lookups would > >> > > > that work ? > >> > > > >> > > I see a couple of problems here. > >> > > > >> > > 1. The new thread will be sharing the "swconn" connection with > >> > > the pinctrl thread. Not sure how much of any issue this may cause. > >> > > Looks like "rconn_send" has a mutex. > >> > > > >> > > 2. There is a possibility that when the new thread iterates over the > >> > > dns_cache shash, the main ovn-controller > >> > > thread can modify this from pinctrl_run(). > >> > > > >> > > Before going this way, is it possible to explore adding another > >> > > mutex say - "dns_cache_mutex" (or any other appropriate name). > >> > > > >> > > pinctrl thread before calling pinctrl_handle_dns_lookup() will lock > >> > > this mutex instead of the > >> > > main "pinctrl_mutex" and sync_dns_cache() will lock this mutex when it > >> > > is adding or deleting the dns_cache. > >> > > > >> > > What do you think ? This way dns requests will not be blocked due to > >> > > ovn-controller waking up and calling pinctrl_run(). > >> > > It will be blocked though when SB dns entries are updated. Or is > >> > > that the case in your setup ? Do OVN dns records > >> > > get updated very frequently ? > >> > > > >> > > Thanks > >> > > Numan > >> > > > >> > > > > >> > > > diff --git a/controller/pinctrl.c b/controller/pinctrl.c > >> > > > index c86b4f940..aaa316542 100644 > >> > > > --- a/controller/pinctrl.c > >> > > > +++ b/controller/pinctrl.c > >> > > > @@ -189,6 +189,27 @@ static struct pinctrl pinctrl; > >> > > > > >> > > > static void init_buffered_packets_ctx(void); > >> > > > static void destroy_buffered_packets_ctx(void); > >> > > > + > >> > > > +/* DNS thread */ > >> > > > +static struct latch dns_thread_exit; > >> > > > +static pthread_t dns_thread; > >> > > > + > >> > > > +/* DNS query queue */ > >> > > > +struct dns_query { > >> > > > + struct dp_packet packet; > >> > > > + struct ofputil_packet_in pin; > >> > > > + struct ofpbuf userdata; > >> > > > + struct ofpbuf *continuation; > >> > > > + struct ovs_list node; > >> > > > +}; > >> > > > + > >> > > > +static struct ovs_list dns_query_queue = > >> > > > OVS_LIST_INITIALIZER(&dns_query_queue); > >> > > > +static struct ovs_mutex dns_queue_mutex = OVS_MUTEX_INITIALIZER; > >> > > > +static struct seq *dns_seq; > >> > > > + > >> > > > +static void dns_queue_push(struct dns_query *query); > >> > > > +static struct dns_query *dns_queue_pop(void); > >> > > > +static void *dns_lookup_thread(void *arg OVS_UNUSED); > >> > > > static void > >> > > > run_buffered_binding(const struct sbrec_mac_binding_table > >> > > > *mac_binding_table, > >> > > > struct ovsdb_idl_index > >> > > > *sbrec_port_binding_by_key, > >> > > > @@ -572,6 +593,10 @@ pinctrl_init(void) > >> > > > latch_init(&pinctrl.pinctrl_thread_exit); > >> > > > pinctrl.pinctrl_thread = ovs_thread_create("ovn_pinctrl", > >> > > > pinctrl_handler, > >> > > > &pinctrl); > >> > > > + latch_init(&dns_thread_exit); > >> > > > + dns_seq = seq_create(); > >> > > > + pinctrl.dns_thread = ovs_thread_create("dns_lookup", > >> > > > dns_lookup_thread, NULL); > >> > > > + > >> > > > } > >> > > > > >> > > > static ovs_be32 > >> > > > @@ -3435,6 +3460,51 @@ dns_build_ptr_answer( > >> > > > #define DNS_RCODE_SERVER_REFUSE 0x5 > >> > > > #define DNS_QUERY_TYPE_CLASS_LEN (2 * sizeof(ovs_be16)) > >> > > > > >> > > > +static void * > >> > > > +dns_lookup_thread(void *arg OVS_UNUSED) > >> > > > +{ > >> > > > + while (!latch_is_set(&dns_thread_exit)) { > >> > > > + struct dns_query *query = dns_queue_pop(); > >> > > > + if (query) { > >> > > > + bool processed = pinctrl_handle_dns_lookup(swconn, > >> > > > &query->packet, > >> > > > + &query->pin, > >> > > > &query->userdata, > >> > > > + > >> > > > query->continuation); > >> > > > + if (!processed) { > >> > > > + queue_msg(swconn, ofputil_encode_resume(&query->pin, > >> > > > + > >> > > > query->continuation, > >> > > > + > >> > > > ofputil_protocol_from_ofp_version( > >> > > > + > >> > > > rconn_get_version(swconn)))); > >> > > > + } > >> > > > + ofpbuf_delete(query->continuation); > >> > > > + free(query); > >> > > > + } > >> > > > + poll_block(); > >> > > > + } > >> > > > + return NULL; > >> > > > +} > >> > > > + > >> > > > +static void > >> > > > +dns_queue_push(struct dns_query *query) > >> > > > +{ > >> > > > + ovs_mutex_lock(&dns_queue_mutex); > >> > > > + ovs_list_push_back(&dns_query_queue, &query->node); > >> > > > + seq_change(dns_seq); > >> > > > + ovs_mutex_unlock(&dns_queue_mutex); > >> > > > +} > >> > > > + > >> > > > +static struct dns_query * > >> > > > +dns_queue_pop(void) > >> > > > +{ > >> > > > + struct dns_query *query = NULL; > >> > > > + ovs_mutex_lock(&dns_queue_mutex); > >> > > > + if (!ovs_list_is_empty(&dns_query_queue)) { > >> > > > + query = CONTAINER_OF(ovs_list_pop_front(&dns_query_queue), > >> > > > struct dns_query, node); > >> > > > + } > >> > > > + ovs_mutex_unlock(&dns_queue_mutex); > >> > > > + return query; > >> > > > +} > >> > > > + > >> > > > + > >> > > > /* Called with in the pinctrl_handler thread context. */ > >> > > > static void > >> > > > pinctrl_handle_dns_lookup( > >> > > > @@ -3804,11 +3874,16 @@ process_packet_in(struct rconn *swconn, const > >> > > > struct ofp_header *msg) > >> > > > break; > >> > > > > >> > > > case ACTION_OPCODE_DNS_LOOKUP: > >> > > > - ovs_mutex_lock(&pinctrl_mutex); > >> > > > - pinctrl_handle_dns_lookup(swconn, &packet, &pin, &userdata, > >> > > > - &continuation); > >> > > > - ovs_mutex_unlock(&pinctrl_mutex); > >> > > > - break; > >> > > > + { > >> > > > + struct dns_query *query = xmalloc(sizeof *query); > >> > > > + dp_packet_clone(&query->packet, &packet); > >> > > > + query->pin = pin; > >> > > > + ofpbuf_clone(&query->userdata, &userdata); > >> > > > + query->continuation = continuation ? > >> > > > ofpbuf_clone(continuation) : NULL; > >> > > > + dns_queue_push(query); > >> > > > + poll_immediate_wake(); > >> > > > + } > >> > > > + break; > >> > > > > >> > > > case ACTION_OPCODE_LOG: > >> > > > handle_acl_log(&headers, &userdata, > >> > > > > >> > > > On Fri, 4 Oct 2024 at 15:59, Gavin McKee <gavmcke...@googlemail.com> > >> > > > wrote: > >> > > > > > >> > > > > Hi Numan, > >> > > > > > >> > > > > Yes , the DNS packets are sent by VMs . We have DNS records for > >> > > > > these VMs so the controller is catching all DNS requests and > >> > > > > failing to respond in a timely manner. > >> > > > > > >> > > > > I think this relates to the fact that there is no incremental > >> > > > > handler for SB DNS. > >> > > > > > >> > > > > In short the blocking thread causes us all sorts of problems . > >> > > > > > >> > > > > Hope that makes sense > >> > > > > > >> > > > > Gav > >> > > > > > >> > > > > On Fri, Oct 4, 2024 at 15:34 Numan Siddique <num...@ovn.org> wrote: > >> > > > >> > >> > > > >> On Fri, Oct 4, 2024 at 5:50 PM Gavin McKee via discuss > >> > > > >> <ovs-discuss@openvswitch.org> wrote: > >> > > > >> > > >> > > > >> > Hi, > >> > > > >> > > >> > > > >> > We currently experience DNS related timeouts when the ovn > >> > > > >> > controller > >> > > > >> > is under high load. > >> > > > >> > > >> > > > >> > Will this commit ensure that DNS packets needing to be > >> > > > >> > forwarded to an > >> > > > >> > external resolver will be processed in a timely manner and not > >> > > > >> > lead to > >> > > > >> > timeouts? > >> > > > >> > https://github.com/ovn-org/ovn/commit/762ae66cd70efa149d91d35305fcef0040e9addd > >> > > > >> > > >> > > > >> > >> > > > >> This particular patch is useful when the ovn-controller calls > >> > > > >> dns_resolve() of ovs/lib/dns_resolve.c [1] > >> > > > >> to resolve a dns entry. > >> > > > >> > >> > > > >> In your case, what are these DNS packets? Are these sent by VMs > >> > > > >> and > >> > > > >> ovn-controller pinctrl thread handles > >> > > > >> these requests and its blocking due to pinctrl_mutex ? > >> > > > >> > >> > > > >> If so, I think we need to find a solution for this problem, but > >> > > > >> the > >> > > > >> above commit will definitely not help you. > >> > > > >> Maybe we can maintain a separate mutex to protect the dns_cache > >> > > > >> [2] so > >> > > > >> that even if ovn-controller main thread > >> > > > >> is busy (and not updating the internal dns cache), the DNS packets > >> > > > >> from the VMs can be handled without any delay > >> > > > >> in the pinctrl thread. > >> > > > >> > >> > > > >> [1] - > >> > > > >> https://github.com/openvswitch/ovs/blob/main/lib/dns-resolve.c#L148 > >> > > > >> [2] - > >> > > > >> https://github.com/ovn-org/ovn/blob/main/controller/pinctrl.c#L3288 > >> > > > >> > >> > > > >> > >> > > > >> Thanks > >> > > > >> Numan > >> > > > >> > >> > > > >> > We currently run OVN 23.09.04. > >> > > > >> > > >> > > > >> > Thanks > >> > > > >> > > >> > > > >> > Gav > >> > > > >> > _______________________________________________ > >> > > > >> > discuss mailing list > >> > > > >> > disc...@openvswitch.org > >> > > > >> > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss > >> > > > >> > > >> > > > _______________________________________________ > >> > > > discuss mailing list > >> > > > disc...@openvswitch.org > >> > > > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss > >> > _______________________________________________ > >> > discuss mailing list > >> > disc...@openvswitch.org > >> > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss > > > > _______________________________________________ > > discuss mailing list > > disc...@openvswitch.org > > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss _______________________________________________ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss