Re: [PATCH v3 03/10] ethdev: bring in async queue-based flow rules operations

Jerin Jacob Tue, 08 Feb 2022 09:37:15 -0800

On Tue, Feb 8, 2022 at 7:42 PM Alexander Kozyrev <akozy...@nvidia.com> wrote:
>
> > On Tuesday, February 8, 2022 5:57 Jerin Jacob <jerinjac...@gmail.com> wrote:
> > On Sun, Feb 6, 2022 at 8:57 AM Alexander Kozyrev <akozy...@nvidia.com>
> > wrote:
>
> Hi Jerin, thanks you for reviewing my patch. I appreciate your input.


Hi Alex,

> I'm planning to send v4 today with addressed comments today to be on time for 
> RC1.
> I hope that my answers are satisfactory for the rest of questions raised by 
> you.

Comments looks good to me. Please remove version field in the next patch.


>
> > >
> > > A new, faster, queue-based flow rules management mechanism is needed
> > for
> > > applications offloading rules inside the datapath. This asynchronous
> > > and lockless mechanism frees the CPU for further packet processing and
> > > reduces the performance impact of the flow rules creation/destruction
> > > on the datapath. Note that queues are not thread-safe and the queue
> > > should be accessed from the same thread for all queue operations.
> > > It is the responsibility of the app to sync the queue functions in case
> > > of multi-threaded access to the same queue.
> > >
> > > The rte_flow_q_flow_create() function enqueues a flow creation to the
> > > requested queue. It benefits from already configured resources and sets
> > > unique values on top of item and action templates. A flow rule is enqueued
> > > on the specified flow queue and offloaded asynchronously to the
> > hardware.
> > > The function returns immediately to spare CPU for further packet
> > > processing. The application must invoke the rte_flow_q_pull() function
> > > to complete the flow rule operation offloading, to clear the queue, and to
> > > receive the operation status. The rte_flow_q_flow_destroy() function
> > > enqueues a flow destruction to the requested queue.
> >
> > It is good to see the implementation, specifically to understand,
>
> We will send PMD implementation in the next few days.
>
> > 1)
> > I understand, We are creating queues to make multiple producers to
> > enqueue multiple jobs in parallel.
> > On the consumer side, Is it HW or some other cores to consume the job?
>
> From API point of view there is no restriction on the type of consumer.
> It could be hardware or software implementation, but in most cases
> (and in our driver) it will be the NIC to handle the requests.
>
> > Can we operate in consumer in parallel?
>
> Yes, we can have separate multiple hardware queues to handle operations
> in parallel independently and without any locking mechanism needed.
>
> > 2) Is Queue part of HW or just SW primitive to submit the work as a channel.
>
> The queue is a software primitive.
>
> >
> > >
> > > Signed-off-by: Alexander Kozyrev <akozy...@nvidia.com>
> > > ---
> > >  doc/guides/prog_guide/img/rte_flow_q_init.svg |  71 ++++
> > >  .../prog_guide/img/rte_flow_q_usage.svg       |  60 +++
> > >  doc/guides/prog_guide/rte_flow.rst            | 159 +++++++-
> > >  doc/guides/rel_notes/release_22_03.rst        |   8 +
> > >  lib/ethdev/rte_flow.c                         | 173 ++++++++-
> > >  lib/ethdev/rte_flow.h                         | 342 ++++++++++++++++++
> > >  lib/ethdev/rte_flow_driver.h                  |  55 +++
> > >  lib/ethdev/version.map                        |   7 +
> > >  8 files changed, 873 insertions(+), 2 deletions(-)
> > >  create mode 100644 doc/guides/prog_guide/img/rte_flow_q_init.svg
> > >  create mode 100644 doc/guides/prog_guide/img/rte_flow_q_usage.svg
> > >
> > > diff --git a/doc/guides/prog_guide/img/rte_flow_q_init.svg
> > b/doc/guides/prog_guide/img/rte_flow_q_init.svg
> > > new file mode 100644
> > > index 0000000000..2080bf4c04
> >
> >
> >
> > Some comments on the diagrams:
> > # rte_flow_q_create_flow and rte_flow_q_destroy_flow used instead of
> > rte_flow_q_flow_create/destroy
> > # rte_flow_q_pull's brackets(i.e ()) not aligned
>
> Will fix this, thanks for noticing.
>
> >
> > > +</svg>
> > > \ No newline at end of file
> > > diff --git a/doc/guides/prog_guide/rte_flow.rst
> > b/doc/guides/prog_guide/rte_flow.rst
> > > index b7799c5abe..734294e65d 100644
> > > --- a/doc/guides/prog_guide/rte_flow.rst
> > > +++ b/doc/guides/prog_guide/rte_flow.rst
> > > @@ -3607,12 +3607,16 @@ Hints about the expected number of counters
> > or meters in an application,
> > >  for example, allow PMD to prepare and optimize NIC memory layout in
> > advance.
> > >  ``rte_flow_configure()`` must be called before any flow rule is created,
> > >  but after an Ethernet device is configured.
> > > +It also creates flow queues for asynchronous flow rules operations via
> > > +queue-based API, see `Asynchronous operations`_ section.
> > >
> > >  .. code-block:: c
> > >
> > >     int
> > >     rte_flow_configure(uint16_t port_id,
> > >                       const struct rte_flow_port_attr *port_attr,
> > > +                     uint16_t nb_queue,
> >
> > # rte_flow_info_get() don't have number of queues, why not adding
> > number queues in rte_flow_port_attr.
>
> Good suggestion, I'll add it to the capabilities structure.
>
> > # And additional APIs for queue_setup() like ethdev.
>
> ethdev has the start function which tells the PMD when all configurations are 
> done.
> In our case there is no such function and device is ready to create a flows 
> as soon
> as we exit the rte_flow_configure(). In addition, the number of queues may 
> affect
> the resource allocation it is best to process all the requested resources at 
> the same time.
>
> >
> > > +                     const struct rte_flow_queue_attr *queue_attr[],
> > >                       struct rte_flow_error *error);
> > >
> > >  Information about resources that can benefit from pre-allocation can be
> > > @@ -3737,7 +3741,7 @@ and pattern and actions templates are created.
> > >
> > >  .. code-block:: c
> > >
> > > -       rte_flow_configure(port, *port_attr, *error);
> > > +       rte_flow_configure(port, *port_attr, nb_queue, *queue_attr,
> > *error);
> > >
> > >         struct rte_flow_pattern_template *pattern_templates[0] =
> > >                 rte_flow_pattern_template_create(port, &itr, &pattern, 
> > > &error);
> > > @@ -3750,6 +3754,159 @@ and pattern and actions templates are created.
> > >                                 *actions_templates, nb_actions_templates,
> > >                                 *error);
> > >
> > > +Asynchronous operations
> > > +-----------------------
> > > +
> > > +Flow rules management can be done via special lockless flow
> > management queues.
> > > +- Queue operations are asynchronous and not thread-safe.
> > > +- Operations can thus be invoked by the app's datapath,
> > > +packet processing can continue while queue operations are processed by
> > NIC.
> > > +- The queue number is configured at initialization stage.
> > > +- Available operation types: rule creation, rule destruction,
> > > +indirect rule creation, indirect rule destruction, indirect rule update.
> > > +- Operations may be reordered within a queue.
> > > +- Operations can be postponed and pushed to NIC in batches.
> > > +- Results pulling must be done on time to avoid queue overflows.
> > > +- User data is returned as part of the result to identify an operation.
> > > +- Flow handle is valid once the creation operation is enqueued and must
> > be
> > > +destroyed even if the operation is not successful and the rule is not
> > inserted.
> >
> > You need CR between lines as rendered text does comes as new line in
> > between the items.
>
> OK.
>
> >
> > > +
> > > +The asynchronous flow rule insertion logic can be broken into two phases.
> > > +
> > > +1. Initialization stage as shown here:
> > > +
> > > +.. _figure_rte_flow_q_init:
> > > +
> > > +.. figure:: img/rte_flow_q_init.*
> > > +
> > > +2. Main loop as presented on a datapath application example:
> > > +
> > > +.. _figure_rte_flow_q_usage:
> > > +
> > > +.. figure:: img/rte_flow_q_usage.*
> >
> > it is better to add sequence operations as text to understand the flow.
>
> I prefer keeping the diagram here, it looks more clean and concise.
> Block of text gives no new information and harder to follow, imho.
>
> >
> > > +
> > > +Enqueue creation operation
> > > +~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > +
> > > +Enqueueing a flow rule creation operation is similar to simple creation.
> >
> > If it is enqueue operation, why not call it ad rte_flow_q_flow_enqueue()
> >
> > > +
> > > +.. code-block:: c
> > > +
> > > +       struct rte_flow *
> > > +       rte_flow_q_flow_create(uint16_t port_id,
> > > +                               uint32_t queue_id,
> > > +                               const struct rte_flow_q_ops_attr 
> > > *q_ops_attr,
> > > +                               struct rte_flow_table *table,
> > > +                               const struct rte_flow_item pattern[],
> > > +                               uint8_t pattern_template_index,
> > > +                               const struct rte_flow_action actions[],
> >
> > If I understand correctly, table is the pre-configured object that has
> > N number of patterns and N number of actions.
> > Why giving items[] and actions[] again?
>
> Table only contains templates for pattern and actions.
> We still need to provide the values for those templates when we create a flow.
> Thus we specify patterns and action here.
>
> > > +                               uint8_t actions_template_index,
> > > +                               struct rte_flow_error *error);
> > > +
> > > +A valid handle in case of success is returned. It must be destroyed later
> > > +by calling ``rte_flow_q_flow_destroy()`` even if the rule is rejected by
> > HW.
> > > +
> > > +Enqueue destruction operation
> > > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >
> > Queue destruction operation.
>
> We are not destroying queue, we are enqueuing the flow destruction operation.
>
> >
> > > +
> > > +Enqueueing a flow rule destruction operation is similar to simple
> > destruction.
> > > +
> > > +.. code-block:: c
> > > +
> > > +       int
> > > +       rte_flow_q_flow_destroy(uint16_t port_id,
> > > +                               uint32_t queue_id,
> > > +                               const struct rte_flow_q_ops_attr 
> > > *q_ops_attr,
> > > +                               struct rte_flow *flow,
> > > +                               struct rte_flow_error *error);
> > > +
> > > +Push enqueued operations
> > > +~~~~~~~~~~~~~~~~~~~~~~~~
> > > +
> > > +Pushing all internally stored rules from a queue to the NIC.
> > > +
> > > +.. code-block:: c
> > > +
> > > +       int
> > > +       rte_flow_q_push(uint16_t port_id,
> > > +                       uint32_t queue_id,
> > > +                       struct rte_flow_error *error);
> > > +
> > > +There is the postpone attribute in the queue operation attributes.
> > > +When it is set, multiple operations can be bulked together and not sent 
> > > to
> > HW
> > > +right away to save SW/HW interactions and prioritize throughput over
> > latency.
> > > +The application must invoke this function to actually push all 
> > > outstanding
> > > +operations to HW in this case.
> > > +
> > > +Pull enqueued operations
> > > +~~~~~~~~~~~~~~~~~~~~~~~~
> > > +
> > > +Pulling asynchronous operations results.
> > > +
> > > +The application must invoke this function in order to complete
> > asynchronous
> > > +flow rule operations and to receive flow rule operations statuses.
> > > +
> > > +.. code-block:: c
> > > +
> > > +       int
> > > +       rte_flow_q_pull(uint16_t port_id,
> > > +                       uint32_t queue_id,
> > > +                       struct rte_flow_q_op_res res[],
> > > +                       uint16_t n_res,
> > > +                       struct rte_flow_error *error);
> > > +
> > > +Multiple outstanding operation results can be pulled simultaneously.
> > > +User data may be provided during a flow creation/destruction in order
> > > +to distinguish between multiple operations. User data is returned as part
> > > +of the result to provide a method to detect which operation is completed.
> > > +
> > > +Enqueue indirect action creation operation
> > > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > +
> > > +Asynchronous version of indirect action creation API.
> > > +
> > > +.. code-block:: c
> > > +
> > > +       struct rte_flow_action_handle *
> > > +       rte_flow_q_action_handle_create(uint16_t port_id,
> >
> > What is the use case for this?
>
> Indirect action creation may take time, it may depend on hardware resources
> allocation. So we add the asynchronous way of creating it the same way.
>
> > How application needs to use this. We already creating flow_table. Is
> > that not sufficient?
>
> The indirect action object is used in flow rules via its handle.
> This is an extension to the already existing API in order to speed up
> the creation of these objects.
>
> >
> > > +                       uint32_t queue_id,
> > > +                       const struct rte_flow_q_ops_attr *q_ops_attr,
> > > +                       const struct rte_flow_indir_action_conf 
> > > *indir_action_conf,
> > > +                       const struct rte_flow_action *action,
> > > +                       struct rte_flow_error *error);
> > > +
> > > +A valid handle in case of success is returned. It must be destroyed 
> > > later by
> > > +calling ``rte_flow_q_action_handle_destroy()`` even if the rule is
> > rejected.
> > > +
> > > +Enqueue indirect action destruction operation
> > > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > +
> > > +Asynchronous version of indirect action destruction API.
> > > +
> > > +.. code-block:: c
> > > +
> > > +       int
> > > +       rte_flow_q_action_handle_destroy(uint16_t port_id,
> > > +                       uint32_t queue_id,
> > > +                       const struct rte_flow_q_ops_attr *q_ops_attr,
> > > +                       struct rte_flow_action_handle *action_handle,
> > > +                       struct rte_flow_error *error);
> > > +
> > > +Enqueue indirect action update operation
> > > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > +
> > > +Asynchronous version of indirect action update API.
> > > +
> > > +.. code-block:: c
> > > +
> > > +       int
> > > +       rte_flow_q_action_handle_update(uint16_t port_id,
> > > +                       uint32_t queue_id,
> > > +                       const struct rte_flow_q_ops_attr *q_ops_attr,
> > > +                       struct rte_flow_action_handle *action_handle,
> > > +                       const void *update,
> > > +                       struct rte_flow_error *error);
> > > +
> > >  .. _flow_isolated_mode:
> > >
> > >  Flow isolated mode
> > > diff --git a/doc/guides/rel_notes/release_22_03.rst
> > b/doc/guides/rel_notes/release_22_03.rst
> > > index d23d1591df..80a85124e6 100644
> > > --- a/doc/guides/rel_notes/release_22_03.rst
> > > +++ b/doc/guides/rel_notes/release_22_03.rst
> > > @@ -67,6 +67,14 @@ New Features
> > >    ``rte_flow_table_destroy``, ``rte_flow_pattern_template_destroy``
> > >    and ``rte_flow_actions_template_destroy``.
> > >
> > > +* ethdev: Added ``rte_flow_q_flow_create`` and
> > ``rte_flow_q_flow_destroy`` API
> > > +  to enqueue flow creaion/destruction operations asynchronously as well
> > as
> > > +  ``rte_flow_q_pull`` to poll and retrieve results of these operations 
> > > and
> > > +  ``rte_flow_q_push`` to push all the in-flight operations to the NIC.
> > > +  Introduced asynchronous API for indirect actions management as well:
> > > +  ``rte_flow_q_action_handle_create``,
> > ``rte_flow_q_action_handle_destroy`` and
> > > +  ``rte_flow_q_action_handle_update``.
> > > +

Re: [PATCH v3 03/10] ethdev: bring in async queue-based flow rules operations

Reply via email to