On Sun, Feb 6, 2022 at 8:57 AM Alexander Kozyrev <akozy...@nvidia.com> wrote: > > A new, faster, queue-based flow rules management mechanism is needed for > applications offloading rules inside the datapath. This asynchronous > and lockless mechanism frees the CPU for further packet processing and > reduces the performance impact of the flow rules creation/destruction > on the datapath. Note that queues are not thread-safe and the queue > should be accessed from the same thread for all queue operations. > It is the responsibility of the app to sync the queue functions in case > of multi-threaded access to the same queue. > > The rte_flow_q_flow_create() function enqueues a flow creation to the > requested queue. It benefits from already configured resources and sets > unique values on top of item and action templates. A flow rule is enqueued > on the specified flow queue and offloaded asynchronously to the hardware. > The function returns immediately to spare CPU for further packet > processing. The application must invoke the rte_flow_q_pull() function > to complete the flow rule operation offloading, to clear the queue, and to > receive the operation status. The rte_flow_q_flow_destroy() function > enqueues a flow destruction to the requested queue.
It is good to see the implementation, specifically to understand, 1) I understand, We are creating queues to make multiple producers to enqueue multiple jobs in parallel. On the consumer side, Is it HW or some other cores to consume the job? Can we operate in consumer in parallel? 2) Is Queue part of HW or just SW primitive to submit the work as a channel. > > Signed-off-by: Alexander Kozyrev <akozy...@nvidia.com> > --- > doc/guides/prog_guide/img/rte_flow_q_init.svg | 71 ++++ > .../prog_guide/img/rte_flow_q_usage.svg | 60 +++ > doc/guides/prog_guide/rte_flow.rst | 159 +++++++- > doc/guides/rel_notes/release_22_03.rst | 8 + > lib/ethdev/rte_flow.c | 173 ++++++++- > lib/ethdev/rte_flow.h | 342 ++++++++++++++++++ > lib/ethdev/rte_flow_driver.h | 55 +++ > lib/ethdev/version.map | 7 + > 8 files changed, 873 insertions(+), 2 deletions(-) > create mode 100644 doc/guides/prog_guide/img/rte_flow_q_init.svg > create mode 100644 doc/guides/prog_guide/img/rte_flow_q_usage.svg > > diff --git a/doc/guides/prog_guide/img/rte_flow_q_init.svg > b/doc/guides/prog_guide/img/rte_flow_q_init.svg > new file mode 100644 > index 0000000000..2080bf4c04 Some comments on the diagrams: # rte_flow_q_create_flow and rte_flow_q_destroy_flow used instead of rte_flow_q_flow_create/destroy # rte_flow_q_pull's brackets(i.e ()) not aligned > +</svg> > \ No newline at end of file > diff --git a/doc/guides/prog_guide/rte_flow.rst > b/doc/guides/prog_guide/rte_flow.rst > index b7799c5abe..734294e65d 100644 > --- a/doc/guides/prog_guide/rte_flow.rst > +++ b/doc/guides/prog_guide/rte_flow.rst > @@ -3607,12 +3607,16 @@ Hints about the expected number of counters or meters > in an application, > for example, allow PMD to prepare and optimize NIC memory layout in advance. > ``rte_flow_configure()`` must be called before any flow rule is created, > but after an Ethernet device is configured. > +It also creates flow queues for asynchronous flow rules operations via > +queue-based API, see `Asynchronous operations`_ section. > > .. code-block:: c > > int > rte_flow_configure(uint16_t port_id, > const struct rte_flow_port_attr *port_attr, > + uint16_t nb_queue, # rte_flow_info_get() don't have number of queues, why not adding number queues in rte_flow_port_attr. # And additional APIs for queue_setup() like ethdev. > + const struct rte_flow_queue_attr *queue_attr[], > struct rte_flow_error *error); > > Information about resources that can benefit from pre-allocation can be > @@ -3737,7 +3741,7 @@ and pattern and actions templates are created. > > .. code-block:: c > > - rte_flow_configure(port, *port_attr, *error); > + rte_flow_configure(port, *port_attr, nb_queue, *queue_attr, *error); > > struct rte_flow_pattern_template *pattern_templates[0] = > rte_flow_pattern_template_create(port, &itr, &pattern, > &error); > @@ -3750,6 +3754,159 @@ and pattern and actions templates are created. > *actions_templates, nb_actions_templates, > *error); > > +Asynchronous operations > +----------------------- > + > +Flow rules management can be done via special lockless flow management > queues. > +- Queue operations are asynchronous and not thread-safe. > +- Operations can thus be invoked by the app's datapath, > +packet processing can continue while queue operations are processed by NIC. > +- The queue number is configured at initialization stage. > +- Available operation types: rule creation, rule destruction, > +indirect rule creation, indirect rule destruction, indirect rule update. > +- Operations may be reordered within a queue. > +- Operations can be postponed and pushed to NIC in batches. > +- Results pulling must be done on time to avoid queue overflows. > +- User data is returned as part of the result to identify an operation. > +- Flow handle is valid once the creation operation is enqueued and must be > +destroyed even if the operation is not successful and the rule is not > inserted. You need CR between lines as rendered text does comes as new line in between the items. > + > +The asynchronous flow rule insertion logic can be broken into two phases. > + > +1. Initialization stage as shown here: > + > +.. _figure_rte_flow_q_init: > + > +.. figure:: img/rte_flow_q_init.* > + > +2. Main loop as presented on a datapath application example: > + > +.. _figure_rte_flow_q_usage: > + > +.. figure:: img/rte_flow_q_usage.* it is better to add sequence operations as text to understand the flow. > + > +Enqueue creation operation > +~~~~~~~~~~~~~~~~~~~~~~~~~~ > + > +Enqueueing a flow rule creation operation is similar to simple creation. If it is enqueue operation, why not call it ad rte_flow_q_flow_enqueue() > + > +.. code-block:: c > + > + struct rte_flow * > + rte_flow_q_flow_create(uint16_t port_id, > + uint32_t queue_id, > + const struct rte_flow_q_ops_attr *q_ops_attr, > + struct rte_flow_table *table, > + const struct rte_flow_item pattern[], > + uint8_t pattern_template_index, > + const struct rte_flow_action actions[], If I understand correctly, table is the pre-configured object that has N number of patterns and N number of actions. Why giving items[] and actions[] again? > + uint8_t actions_template_index, > + struct rte_flow_error *error); > + > +A valid handle in case of success is returned. It must be destroyed later > +by calling ``rte_flow_q_flow_destroy()`` even if the rule is rejected by HW. > + > +Enqueue destruction operation > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Queue destruction operation. > + > +Enqueueing a flow rule destruction operation is similar to simple > destruction. > + > +.. code-block:: c > + > + int > + rte_flow_q_flow_destroy(uint16_t port_id, > + uint32_t queue_id, > + const struct rte_flow_q_ops_attr *q_ops_attr, > + struct rte_flow *flow, > + struct rte_flow_error *error); > + > +Push enqueued operations > +~~~~~~~~~~~~~~~~~~~~~~~~ > + > +Pushing all internally stored rules from a queue to the NIC. > + > +.. code-block:: c > + > + int > + rte_flow_q_push(uint16_t port_id, > + uint32_t queue_id, > + struct rte_flow_error *error); > + > +There is the postpone attribute in the queue operation attributes. > +When it is set, multiple operations can be bulked together and not sent to HW > +right away to save SW/HW interactions and prioritize throughput over latency. > +The application must invoke this function to actually push all outstanding > +operations to HW in this case. > + > +Pull enqueued operations > +~~~~~~~~~~~~~~~~~~~~~~~~ > + > +Pulling asynchronous operations results. > + > +The application must invoke this function in order to complete asynchronous > +flow rule operations and to receive flow rule operations statuses. > + > +.. code-block:: c > + > + int > + rte_flow_q_pull(uint16_t port_id, > + uint32_t queue_id, > + struct rte_flow_q_op_res res[], > + uint16_t n_res, > + struct rte_flow_error *error); > + > +Multiple outstanding operation results can be pulled simultaneously. > +User data may be provided during a flow creation/destruction in order > +to distinguish between multiple operations. User data is returned as part > +of the result to provide a method to detect which operation is completed. > + > +Enqueue indirect action creation operation > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > + > +Asynchronous version of indirect action creation API. > + > +.. code-block:: c > + > + struct rte_flow_action_handle * > + rte_flow_q_action_handle_create(uint16_t port_id, What is the use case for this? How application needs to use this. We already creating flow_table. Is that not sufficient? > + uint32_t queue_id, > + const struct rte_flow_q_ops_attr *q_ops_attr, > + const struct rte_flow_indir_action_conf > *indir_action_conf, > + const struct rte_flow_action *action, > + struct rte_flow_error *error); > + > +A valid handle in case of success is returned. It must be destroyed later by > +calling ``rte_flow_q_action_handle_destroy()`` even if the rule is rejected. > + > +Enqueue indirect action destruction operation > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > + > +Asynchronous version of indirect action destruction API. > + > +.. code-block:: c > + > + int > + rte_flow_q_action_handle_destroy(uint16_t port_id, > + uint32_t queue_id, > + const struct rte_flow_q_ops_attr *q_ops_attr, > + struct rte_flow_action_handle *action_handle, > + struct rte_flow_error *error); > + > +Enqueue indirect action update operation > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > + > +Asynchronous version of indirect action update API. > + > +.. code-block:: c > + > + int > + rte_flow_q_action_handle_update(uint16_t port_id, > + uint32_t queue_id, > + const struct rte_flow_q_ops_attr *q_ops_attr, > + struct rte_flow_action_handle *action_handle, > + const void *update, > + struct rte_flow_error *error); > + > .. _flow_isolated_mode: > > Flow isolated mode > diff --git a/doc/guides/rel_notes/release_22_03.rst > b/doc/guides/rel_notes/release_22_03.rst > index d23d1591df..80a85124e6 100644 > --- a/doc/guides/rel_notes/release_22_03.rst > +++ b/doc/guides/rel_notes/release_22_03.rst > @@ -67,6 +67,14 @@ New Features > ``rte_flow_table_destroy``, ``rte_flow_pattern_template_destroy`` > and ``rte_flow_actions_template_destroy``. > > +* ethdev: Added ``rte_flow_q_flow_create`` and ``rte_flow_q_flow_destroy`` > API > + to enqueue flow creaion/destruction operations asynchronously as well as > + ``rte_flow_q_pull`` to poll and retrieve results of these operations and > + ``rte_flow_q_push`` to push all the in-flight operations to the NIC. > + Introduced asynchronous API for indirect actions management as well: > + ``rte_flow_q_action_handle_create``, ``rte_flow_q_action_handle_destroy`` > and > + ``rte_flow_q_action_handle_update``. > +