Folks,

I have a stale cache of interface data in the layer above
my VPP API calls and I need to refresh it.  So I wrote
a vpp_intf_refresh_all() function.  It looks roughly like this:

    vpp_intf_refresh_all() {
        if (intf data is not dirty)
   return;
        for each is_ipv6 in {0,1} {
     vpp_ip_dump(is_ipv6);
        }

sleep(2) // See commentary

        for each is_ipv6 in {0,1} {
     vpp_ip_address_dump_all(is_ipv6); // hits all IFs
        }
        vpp_sw_interface_dump()
        intf data is now clean
    }

My "details handlers" develop a few vectors of information
in almost the exact same way as the code in api_format.c does.
That is to say:

    ip_dump/ip_details_t_handler -- form a vector of ip_details
        with an entry for each if-index that is returned.
Note that there is no way to know how many interfaces
will be handled by the ip_details_t_handler function.
Let me say that differently: We have no way of knowing
when it is finished and will not be called again on
behalf of the original IP_DUMP request.

    ip_address_dump/ip_address_details -- Using the vector of
        ip_details formed during the ip_dump pass, iterate over
each IF and request its ip_address_dump to form another
vector of addresses on that specific interface.

Here's the thing:
    If I remove the sleep(2), this code fails.
    If I leave the sleep(2), this code works.

On the one hand, if there is enough time for all of the ip_details
to be handled, and the vector of ip_details to be formed, then
the next set of API calls, ip_address_dump, will work correctly.

On the other hand, if the API driving code is allowed to proceed
before the async replies to all the ip_dump requests are done,
then it will not have a proper ip_details vector and thus fail.

I've just described a classic asynchronous failure mode.
Soltions abound in other worlds.  What is the recommended
approach in this world?

So, why does VAT work?  Because it effectively serializes these
steps with enough time in between each one to allow all the async
behavior to be unnoticed, and not affect the next step.  But even
beyond that, it tries to detect this situation and tells the user
to do it differently.  From vl_api_address_details_t_handler():

  if (!details || vam->current_sw_if_index >= vec_len (details)
      || !details[vam->current_sw_if_index].present)
    {
      errmsg ("ip address details arrived but not stored");
      errmsg ("ip_dump should be called first");
      return;
    }

Sending a CONTROL_PING to flush the write-side of the API isn't
good enough.  Placing an arbitrary sleep() in the code is an
incredibly fragile approach.  OK, it's the wrong solution.

Is there some form of API synchronization that I missed somewhere?

Can we instroduce an actual WAIT_FOR_COMPLETION event into the
API message handling pipeline?  I'm just thinking something
that would be issued as an API call where my sleep(2) is,
and would cause the API handling side to stall until the reply
side is drained?

Reading code, eg, vl_api_ip_dump_t_handler(), I see that it just
iterates and drops messages into shmem queue.  So, yeah, knowing
when that reply send queue has drained will be hard.

OK, so, what if we added a "is_last_detail" (bool) flag to all
the *_details_t messages?  That way we can know when we are done
waiting for the results to come back?  I could at least write
a spin-until-last-messages-seen-or-timeout sort of watcher.

Thoughts?

jdl
_______________________________________________
vpp-dev mailing list
vpp-dev@lists.fd.io
https://lists.fd.io/mailman/listinfo/vpp-dev

Reply via email to