In general, most of “communication” between VPP components
is done by directly calling C functions,
so it makes sense avf_flag_change is being called within vl_api_clnt_process 
process.
It is avf_process_request (called directly by avf_flag_change)
that decides to hand-off the request to avf_process process for async handling,
so it should make sure to resume the API process correctly upon the response.

> just to set a mac address?

In my particular test the async operation switches promiscuous mode on an 
interface,
but I guess it does not really matter what a particular operation does.
What matters is there is a synchronous API call (l2_patch_add_del in my test)
which only indirectly causes an asynchronous operation (as the interface uses 
AVF driver).

> Do we really need to block the binary api

The l2_patch_add_del does block.
Especially in the “del” case, the subsequent API calls
need to know whether the interface is gone yet or not.

> pass opaques in requests

As usual, there are several ways to make it work,
we just need to pick one (and put an example usage into the docs).

Vratko.

From: vpp-dev@lists.fd.io <vpp-dev@lists.fd.io> On Behalf Of Florin Coras
Sent: Monday, 2022-September-12 23:11
To: vpp-dev@lists.fd.io
Subject: Re: [vpp-dev] request-response between vlib processes

Hi Vratko,

Do we really need to block the binary api waiting for a reply from another vpp 
process just to set a mac address?

If setting up the mac (or similar) cannot be done synchronously, probably api 
handlers should hand over all those requests to another vpp process, 
vl_api_async_req_process, that takes care of async execution and generation of 
api replies. You could also pass opaques in requests and maybe expect backends, 
like avf_process, to bounce that opaques back for demuxing.

Regards,
Florin


On Sep 12, 2022, at 4:49 AM, Vratko Polak -X (vrpolak - PANTHEON TECHNOLOGIES 
at Cisco) via lists.fd.io<http://lists.fd.io> 
<vrpolak=cisco....@lists.fd.io<mailto:vrpolak=cisco....@lists.fd.io>> wrote:

[resending to the correct vpp-dev e-mail address]

Short version:
Vratko would appreciate something like 
vlib_current_process_wait_for_one_time_event_or_clock.

Medium version:
One instance of request-response interaction between vlib processes had a bug 
[11].
Vratko contributed a fix [9] for the immediate issue,
but the proper fix was left hinted in TODOs (and discussed in the long version).

Long version:

Vlib supports processes and signals, see corresponding sections in the docs [7].
Using the actor model vocabulary, a (vlib) process is an actor,
and (vlib) signaling a (vlib) event means sending a message between actors.
There is no vlib name for actor behavior [10].

The typical use of event signaling in VPP is “fire and forget”,
meaning a “request” without any need to respond.
That means a typical process has just one behavior;
the side effects of a process are given by event type (and data),
not directly by the sequence of previous events received.

But there is an exception (and in future there may be more).
The process avf_process, when handling AVF_PROCESS_EVENT_REQ
and detecting that was signaled by some other process,
it signals back a “response” event.
The main reason is that some operations may take unreasonably long time,
and we prefer VPP to crash there (instead of getting stuck)
so we can see the backtrace.

A typical process that signaled AVF_PROCESS_EVENT_REQ is vl_api_clnt_process,
whose loop usually handles SOCKET_READ_EVENT events.
I mean, this socket API handling process has no idea about AVF plugin specific 
needs,
but it can call avf_process_request function which (upon detecting it is not 
called
from avf_process process) does the signaling and waiting.

But this means vl_api_clnt_process is the first process (that I know of) with 
two behaviors.
The first one focuses on handling new API messages,
the second one focuses on handling the AVF response (especially the lack 
thereof in time).
As clib_panic is called when the response does not arrive,
(and I hope there are never two requests at the same time)
the first behavior never encounters the AVF response.
But the second behavior can encounter SOCKET_READ_EVENT.
The VPP-2033 [11] bug is what happens in that case.

A minor issue is that the “response” event is defined just by
event type being zero, which would not work in (hypothetical future) scenarios
when a single process waits for two different responses.

Reading through node_funcs.h I found 
vlib_current_process_wait_for_one_time_event [12],
which looks suited for waiting for “single response” events,
but it lacks the time awareness of vlib_process_wait_for_event_or_clock.
If we had something like vlib_current_process_wait_for_one_time_event_or_clock
(and its example usage in the docs), handling the response would become easier.

Vratko.

[7] 
https://github.com/FDio/vpp/blob/9ad39c026c8a3c945a7003c4aa4f5cb1d4c80160/docs/developer/corearchitecture/vlib.rst
[9] https://gerrit.fd.io/r/c/vpp/+/37083
[10] https://en.wikipedia.org/wiki/Actor_model#Behaviors
[11] https://jira.fd.io/browse/VPP-2033
[12] 
https://github.com/FDio/vpp/blob/16052480c377127f9cb7facbab53f46e595b27cf/src/vlib/node_funcs.h#L1186




-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#21873): https://lists.fd.io/g/vpp-dev/message/21873
Mute This Topic: https://lists.fd.io/mt/93630182/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

  • ... Vratko Polak -X (vrpolak - PANTHEON TECHNOLOGIES at Cisco) via lists.fd.io
    • ... Florin Coras
      • ... Vratko Polak -X (vrpolak - PANTHEON TECHNOLOGIES at Cisco) via lists.fd.io
        • ... Florin Coras
          • ... Vratko Polak -X (vrpolak - PANTHEON TECHNOLOGIES at Cisco) via lists.fd.io

Reply via email to