RE: [PATCH v3 03/10] ethdev: bring in async queue-based flow rules operations

Alexander Kozyrev Tue, 08 Feb 2022 06:12:14 -0800

> On Tuesday, February 8, 2022 5:57 Jerin Jacob <jerinjac...@gmail.com> wrote:
> On Sun, Feb 6, 2022 at 8:57 AM Alexander Kozyrev <akozy...@nvidia.com>
> wrote:


Hi Jerin, thanks you for reviewing my patch. I appreciate your input.
I'm planning to send v4 today with addressed comments today to be on time for 
RC1.
I hope that my answers are satisfactory for the rest of questions raised by you.

> >
> > A new, faster, queue-based flow rules management mechanism is needed
> for
> > applications offloading rules inside the datapath. This asynchronous
> > and lockless mechanism frees the CPU for further packet processing and
> > reduces the performance impact of the flow rules creation/destruction
> > on the datapath. Note that queues are not thread-safe and the queue
> > should be accessed from the same thread for all queue operations.
> > It is the responsibility of the app to sync the queue functions in case
> > of multi-threaded access to the same queue.
> >
> > The rte_flow_q_flow_create() function enqueues a flow creation to the
> > requested queue. It benefits from already configured resources and sets
> > unique values on top of item and action templates. A flow rule is enqueued
> > on the specified flow queue and offloaded asynchronously to the
> hardware.
> > The function returns immediately to spare CPU for further packet
> > processing. The application must invoke the rte_flow_q_pull() function
> > to complete the flow rule operation offloading, to clear the queue, and to
> > receive the operation status. The rte_flow_q_flow_destroy() function
> > enqueues a flow destruction to the requested queue.
> 
> It is good to see the implementation, specifically to understand,

We will send PMD implementation in the next few days.

> 1)
> I understand, We are creating queues to make multiple producers to
> enqueue multiple jobs in parallel.
> On the consumer side, Is it HW or some other cores to consume the job?

From API point of view there is no restriction on the type of consumer.
It could be hardware or software implementation, but in most cases
(and in our driver) it will be the NIC to handle the requests.

> Can we operate in consumer in parallel?

Yes, we can have separate multiple hardware queues to handle operations
in parallel independently and without any locking mechanism needed.

> 2) Is Queue part of HW or just SW primitive to submit the work as a channel.

The queue is a software primitive.

> 
> >
> > Signed-off-by: Alexander Kozyrev <akozy...@nvidia.com>
> > ---
> >  doc/guides/prog_guide/img/rte_flow_q_init.svg |  71 ++++
> >  .../prog_guide/img/rte_flow_q_usage.svg       |  60 +++
> >  doc/guides/prog_guide/rte_flow.rst            | 159 +++++++-
> >  doc/guides/rel_notes/release_22_03.rst        |   8 +
> >  lib/ethdev/rte_flow.c                         | 173 ++++++++-
> >  lib/ethdev/rte_flow.h                         | 342 ++++++++++++++++++
> >  lib/ethdev/rte_flow_driver.h                  |  55 +++
> >  lib/ethdev/version.map                        |   7 +
> >  8 files changed, 873 insertions(+), 2 deletions(-)
> >  create mode 100644 doc/guides/prog_guide/img/rte_flow_q_init.svg
> >  create mode 100644 doc/guides/prog_guide/img/rte_flow_q_usage.svg
> >
> > diff --git a/doc/guides/prog_guide/img/rte_flow_q_init.svg
> b/doc/guides/prog_guide/img/rte_flow_q_init.svg
> > new file mode 100644
> > index 0000000000..2080bf4c04
> 
> 
> 
> Some comments on the diagrams:
> # rte_flow_q_create_flow and rte_flow_q_destroy_flow used instead of
> rte_flow_q_flow_create/destroy
> # rte_flow_q_pull's brackets(i.e ()) not aligned

Will fix this, thanks for noticing.
 
> 
> > +</svg>
> > \ No newline at end of file
> > diff --git a/doc/guides/prog_guide/rte_flow.rst
> b/doc/guides/prog_guide/rte_flow.rst
> > index b7799c5abe..734294e65d 100644
> > --- a/doc/guides/prog_guide/rte_flow.rst
> > +++ b/doc/guides/prog_guide/rte_flow.rst
> > @@ -3607,12 +3607,16 @@ Hints about the expected number of counters
> or meters in an application,
> >  for example, allow PMD to prepare and optimize NIC memory layout in
> advance.
> >  ``rte_flow_configure()`` must be called before any flow rule is created,
> >  but after an Ethernet device is configured.
> > +It also creates flow queues for asynchronous flow rules operations via
> > +queue-based API, see `Asynchronous operations`_ section.
> >
> >  .. code-block:: c
> >
> >     int
> >     rte_flow_configure(uint16_t port_id,
> >                       const struct rte_flow_port_attr *port_attr,
> > +                     uint16_t nb_queue,
> 
> # rte_flow_info_get() don't have number of queues, why not adding
> number queues in rte_flow_port_attr.

Good suggestion, I'll add it to the capabilities structure.

> # And additional APIs for queue_setup() like ethdev.

ethdev has the start function which tells the PMD when all configurations are 
done.
In our case there is no such function and device is ready to create a flows as 
soon
as we exit the rte_flow_configure(). In addition, the number of queues may 
affect
the resource allocation it is best to process all the requested resources at 
the same time.

> 
> > +                     const struct rte_flow_queue_attr *queue_attr[],
> >                       struct rte_flow_error *error);
> >
> >  Information about resources that can benefit from pre-allocation can be
> > @@ -3737,7 +3741,7 @@ and pattern and actions templates are created.
> >
> >  .. code-block:: c
> >
> > -       rte_flow_configure(port, *port_attr, *error);
> > +       rte_flow_configure(port, *port_attr, nb_queue, *queue_attr,
> *error);
> >
> >         struct rte_flow_pattern_template *pattern_templates[0] =
> >                 rte_flow_pattern_template_create(port, &itr, &pattern, 
> > &error);
> > @@ -3750,6 +3754,159 @@ and pattern and actions templates are created.
> >                                 *actions_templates, nb_actions_templates,
> >                                 *error);
> >
> > +Asynchronous operations
> > +-----------------------
> > +
> > +Flow rules management can be done via special lockless flow
> management queues.
> > +- Queue operations are asynchronous and not thread-safe.
> > +- Operations can thus be invoked by the app's datapath,
> > +packet processing can continue while queue operations are processed by
> NIC.
> > +- The queue number is configured at initialization stage.
> > +- Available operation types: rule creation, rule destruction,
> > +indirect rule creation, indirect rule destruction, indirect rule update.
> > +- Operations may be reordered within a queue.
> > +- Operations can be postponed and pushed to NIC in batches.
> > +- Results pulling must be done on time to avoid queue overflows.
> > +- User data is returned as part of the result to identify an operation.
> > +- Flow handle is valid once the creation operation is enqueued and must
> be
> > +destroyed even if the operation is not successful and the rule is not
> inserted.
> 
> You need CR between lines as rendered text does comes as new line in
> between the items.

OK.

> 
> > +
> > +The asynchronous flow rule insertion logic can be broken into two phases.
> > +
> > +1. Initialization stage as shown here:
> > +
> > +.. _figure_rte_flow_q_init:
> > +
> > +.. figure:: img/rte_flow_q_init.*
> > +
> > +2. Main loop as presented on a datapath application example:
> > +
> > +.. _figure_rte_flow_q_usage:
> > +
> > +.. figure:: img/rte_flow_q_usage.*
> 
> it is better to add sequence operations as text to understand the flow.

I prefer keeping the diagram here, it looks more clean and concise.
Block of text gives no new information and harder to follow, imho.

> 
> > +
> > +Enqueue creation operation
> > +~~~~~~~~~~~~~~~~~~~~~~~~~~
> > +
> > +Enqueueing a flow rule creation operation is similar to simple creation.
> 
> If it is enqueue operation, why not call it ad rte_flow_q_flow_enqueue()
> 
> > +
> > +.. code-block:: c
> > +
> > +       struct rte_flow *
> > +       rte_flow_q_flow_create(uint16_t port_id,
> > +                               uint32_t queue_id,
> > +                               const struct rte_flow_q_ops_attr 
> > *q_ops_attr,
> > +                               struct rte_flow_table *table,
> > +                               const struct rte_flow_item pattern[],
> > +                               uint8_t pattern_template_index,
> > +                               const struct rte_flow_action actions[],
> 
> If I understand correctly, table is the pre-configured object that has
> N number of patterns and N number of actions.
> Why giving items[] and actions[] again?

Table only contains templates for pattern and actions.
We still need to provide the values for those templates when we create a flow.
Thus we specify patterns and action here.

> > +                               uint8_t actions_template_index,
> > +                               struct rte_flow_error *error);
> > +
> > +A valid handle in case of success is returned. It must be destroyed later
> > +by calling ``rte_flow_q_flow_destroy()`` even if the rule is rejected by
> HW.
> > +
> > +Enqueue destruction operation
> > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 
> Queue destruction operation.

We are not destroying queue, we are enqueuing the flow destruction operation.

> 
> > +
> > +Enqueueing a flow rule destruction operation is similar to simple
> destruction.
> > +
> > +.. code-block:: c
> > +
> > +       int
> > +       rte_flow_q_flow_destroy(uint16_t port_id,
> > +                               uint32_t queue_id,
> > +                               const struct rte_flow_q_ops_attr 
> > *q_ops_attr,
> > +                               struct rte_flow *flow,
> > +                               struct rte_flow_error *error);
> > +
> > +Push enqueued operations
> > +~~~~~~~~~~~~~~~~~~~~~~~~
> > +
> > +Pushing all internally stored rules from a queue to the NIC.
> > +
> > +.. code-block:: c
> > +
> > +       int
> > +       rte_flow_q_push(uint16_t port_id,
> > +                       uint32_t queue_id,
> > +                       struct rte_flow_error *error);
> > +
> > +There is the postpone attribute in the queue operation attributes.
> > +When it is set, multiple operations can be bulked together and not sent to
> HW
> > +right away to save SW/HW interactions and prioritize throughput over
> latency.
> > +The application must invoke this function to actually push all outstanding
> > +operations to HW in this case.
> > +
> > +Pull enqueued operations
> > +~~~~~~~~~~~~~~~~~~~~~~~~
> > +
> > +Pulling asynchronous operations results.
> > +
> > +The application must invoke this function in order to complete
> asynchronous
> > +flow rule operations and to receive flow rule operations statuses.
> > +
> > +.. code-block:: c
> > +
> > +       int
> > +       rte_flow_q_pull(uint16_t port_id,
> > +                       uint32_t queue_id,
> > +                       struct rte_flow_q_op_res res[],
> > +                       uint16_t n_res,
> > +                       struct rte_flow_error *error);
> > +
> > +Multiple outstanding operation results can be pulled simultaneously.
> > +User data may be provided during a flow creation/destruction in order
> > +to distinguish between multiple operations. User data is returned as part
> > +of the result to provide a method to detect which operation is completed.
> > +
> > +Enqueue indirect action creation operation
> > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > +
> > +Asynchronous version of indirect action creation API.
> > +
> > +.. code-block:: c
> > +
> > +       struct rte_flow_action_handle *
> > +       rte_flow_q_action_handle_create(uint16_t port_id,
> 
> What is the use case for this?

Indirect action creation may take time, it may depend on hardware resources
allocation. So we add the asynchronous way of creating it the same way.

> How application needs to use this. We already creating flow_table. Is
> that not sufficient?

The indirect action object is used in flow rules via its handle.
This is an extension to the already existing API in order to speed up
the creation of these objects.

> 
> > +                       uint32_t queue_id,
> > +                       const struct rte_flow_q_ops_attr *q_ops_attr,
> > +                       const struct rte_flow_indir_action_conf 
> > *indir_action_conf,
> > +                       const struct rte_flow_action *action,
> > +                       struct rte_flow_error *error);
> > +
> > +A valid handle in case of success is returned. It must be destroyed later 
> > by
> > +calling ``rte_flow_q_action_handle_destroy()`` even if the rule is
> rejected.
> > +
> > +Enqueue indirect action destruction operation
> > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > +
> > +Asynchronous version of indirect action destruction API.
> > +
> > +.. code-block:: c
> > +
> > +       int
> > +       rte_flow_q_action_handle_destroy(uint16_t port_id,
> > +                       uint32_t queue_id,
> > +                       const struct rte_flow_q_ops_attr *q_ops_attr,
> > +                       struct rte_flow_action_handle *action_handle,
> > +                       struct rte_flow_error *error);
> > +
> > +Enqueue indirect action update operation
> > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > +
> > +Asynchronous version of indirect action update API.
> > +
> > +.. code-block:: c
> > +
> > +       int
> > +       rte_flow_q_action_handle_update(uint16_t port_id,
> > +                       uint32_t queue_id,
> > +                       const struct rte_flow_q_ops_attr *q_ops_attr,
> > +                       struct rte_flow_action_handle *action_handle,
> > +                       const void *update,
> > +                       struct rte_flow_error *error);
> > +
> >  .. _flow_isolated_mode:
> >
> >  Flow isolated mode
> > diff --git a/doc/guides/rel_notes/release_22_03.rst
> b/doc/guides/rel_notes/release_22_03.rst
> > index d23d1591df..80a85124e6 100644
> > --- a/doc/guides/rel_notes/release_22_03.rst
> > +++ b/doc/guides/rel_notes/release_22_03.rst
> > @@ -67,6 +67,14 @@ New Features
> >    ``rte_flow_table_destroy``, ``rte_flow_pattern_template_destroy``
> >    and ``rte_flow_actions_template_destroy``.
> >
> > +* ethdev: Added ``rte_flow_q_flow_create`` and
> ``rte_flow_q_flow_destroy`` API
> > +  to enqueue flow creaion/destruction operations asynchronously as well
> as
> > +  ``rte_flow_q_pull`` to poll and retrieve results of these operations and
> > +  ``rte_flow_q_push`` to push all the in-flight operations to the NIC.
> > +  Introduced asynchronous API for indirect actions management as well:
> > +  ``rte_flow_q_action_handle_create``,
> ``rte_flow_q_action_handle_destroy`` and
> > +  ``rte_flow_q_action_handle_update``.
> > +

RE: [PATCH v3 03/10] ethdev: bring in async queue-based flow rules operations

Reply via email to