On Tue, Feb 8, 2022 at 7:42 PM Alexander Kozyrev <akozy...@nvidia.com> wrote: > > > On Tuesday, February 8, 2022 5:57 Jerin Jacob <jerinjac...@gmail.com> wrote: > > On Sun, Feb 6, 2022 at 8:57 AM Alexander Kozyrev <akozy...@nvidia.com> > > wrote: > > Hi Jerin, thanks you for reviewing my patch. I appreciate your input.
Hi Alex, > I'm planning to send v4 today with addressed comments today to be on time for > RC1. > I hope that my answers are satisfactory for the rest of questions raised by > you. Comments looks good to me. Please remove version field in the next patch. > > > > > > > A new, faster, queue-based flow rules management mechanism is needed > > for > > > applications offloading rules inside the datapath. This asynchronous > > > and lockless mechanism frees the CPU for further packet processing and > > > reduces the performance impact of the flow rules creation/destruction > > > on the datapath. Note that queues are not thread-safe and the queue > > > should be accessed from the same thread for all queue operations. > > > It is the responsibility of the app to sync the queue functions in case > > > of multi-threaded access to the same queue. > > > > > > The rte_flow_q_flow_create() function enqueues a flow creation to the > > > requested queue. It benefits from already configured resources and sets > > > unique values on top of item and action templates. A flow rule is enqueued > > > on the specified flow queue and offloaded asynchronously to the > > hardware. > > > The function returns immediately to spare CPU for further packet > > > processing. The application must invoke the rte_flow_q_pull() function > > > to complete the flow rule operation offloading, to clear the queue, and to > > > receive the operation status. The rte_flow_q_flow_destroy() function > > > enqueues a flow destruction to the requested queue. > > > > It is good to see the implementation, specifically to understand, > > We will send PMD implementation in the next few days. > > > 1) > > I understand, We are creating queues to make multiple producers to > > enqueue multiple jobs in parallel. > > On the consumer side, Is it HW or some other cores to consume the job? > > From API point of view there is no restriction on the type of consumer. > It could be hardware or software implementation, but in most cases > (and in our driver) it will be the NIC to handle the requests. > > > Can we operate in consumer in parallel? > > Yes, we can have separate multiple hardware queues to handle operations > in parallel independently and without any locking mechanism needed. > > > 2) Is Queue part of HW or just SW primitive to submit the work as a channel. > > The queue is a software primitive. > > > > > > > > > Signed-off-by: Alexander Kozyrev <akozy...@nvidia.com> > > > --- > > > doc/guides/prog_guide/img/rte_flow_q_init.svg | 71 ++++ > > > .../prog_guide/img/rte_flow_q_usage.svg | 60 +++ > > > doc/guides/prog_guide/rte_flow.rst | 159 +++++++- > > > doc/guides/rel_notes/release_22_03.rst | 8 + > > > lib/ethdev/rte_flow.c | 173 ++++++++- > > > lib/ethdev/rte_flow.h | 342 ++++++++++++++++++ > > > lib/ethdev/rte_flow_driver.h | 55 +++ > > > lib/ethdev/version.map | 7 + > > > 8 files changed, 873 insertions(+), 2 deletions(-) > > > create mode 100644 doc/guides/prog_guide/img/rte_flow_q_init.svg > > > create mode 100644 doc/guides/prog_guide/img/rte_flow_q_usage.svg > > > > > > diff --git a/doc/guides/prog_guide/img/rte_flow_q_init.svg > > b/doc/guides/prog_guide/img/rte_flow_q_init.svg > > > new file mode 100644 > > > index 0000000000..2080bf4c04 > > > > > > > > Some comments on the diagrams: > > # rte_flow_q_create_flow and rte_flow_q_destroy_flow used instead of > > rte_flow_q_flow_create/destroy > > # rte_flow_q_pull's brackets(i.e ()) not aligned > > Will fix this, thanks for noticing. > > > > > > +</svg> > > > \ No newline at end of file > > > diff --git a/doc/guides/prog_guide/rte_flow.rst > > b/doc/guides/prog_guide/rte_flow.rst > > > index b7799c5abe..734294e65d 100644 > > > --- a/doc/guides/prog_guide/rte_flow.rst > > > +++ b/doc/guides/prog_guide/rte_flow.rst > > > @@ -3607,12 +3607,16 @@ Hints about the expected number of counters > > or meters in an application, > > > for example, allow PMD to prepare and optimize NIC memory layout in > > advance. > > > ``rte_flow_configure()`` must be called before any flow rule is created, > > > but after an Ethernet device is configured. > > > +It also creates flow queues for asynchronous flow rules operations via > > > +queue-based API, see `Asynchronous operations`_ section. > > > > > > .. code-block:: c > > > > > > int > > > rte_flow_configure(uint16_t port_id, > > > const struct rte_flow_port_attr *port_attr, > > > + uint16_t nb_queue, > > > > # rte_flow_info_get() don't have number of queues, why not adding > > number queues in rte_flow_port_attr. > > Good suggestion, I'll add it to the capabilities structure. > > > # And additional APIs for queue_setup() like ethdev. > > ethdev has the start function which tells the PMD when all configurations are > done. > In our case there is no such function and device is ready to create a flows > as soon > as we exit the rte_flow_configure(). In addition, the number of queues may > affect > the resource allocation it is best to process all the requested resources at > the same time. > > > > > > + const struct rte_flow_queue_attr *queue_attr[], > > > struct rte_flow_error *error); > > > > > > Information about resources that can benefit from pre-allocation can be > > > @@ -3737,7 +3741,7 @@ and pattern and actions templates are created. > > > > > > .. code-block:: c > > > > > > - rte_flow_configure(port, *port_attr, *error); > > > + rte_flow_configure(port, *port_attr, nb_queue, *queue_attr, > > *error); > > > > > > struct rte_flow_pattern_template *pattern_templates[0] = > > > rte_flow_pattern_template_create(port, &itr, &pattern, > > > &error); > > > @@ -3750,6 +3754,159 @@ and pattern and actions templates are created. > > > *actions_templates, nb_actions_templates, > > > *error); > > > > > > +Asynchronous operations > > > +----------------------- > > > + > > > +Flow rules management can be done via special lockless flow > > management queues. > > > +- Queue operations are asynchronous and not thread-safe. > > > +- Operations can thus be invoked by the app's datapath, > > > +packet processing can continue while queue operations are processed by > > NIC. > > > +- The queue number is configured at initialization stage. > > > +- Available operation types: rule creation, rule destruction, > > > +indirect rule creation, indirect rule destruction, indirect rule update. > > > +- Operations may be reordered within a queue. > > > +- Operations can be postponed and pushed to NIC in batches. > > > +- Results pulling must be done on time to avoid queue overflows. > > > +- User data is returned as part of the result to identify an operation. > > > +- Flow handle is valid once the creation operation is enqueued and must > > be > > > +destroyed even if the operation is not successful and the rule is not > > inserted. > > > > You need CR between lines as rendered text does comes as new line in > > between the items. > > OK. > > > > > > + > > > +The asynchronous flow rule insertion logic can be broken into two phases. > > > + > > > +1. Initialization stage as shown here: > > > + > > > +.. _figure_rte_flow_q_init: > > > + > > > +.. figure:: img/rte_flow_q_init.* > > > + > > > +2. Main loop as presented on a datapath application example: > > > + > > > +.. _figure_rte_flow_q_usage: > > > + > > > +.. figure:: img/rte_flow_q_usage.* > > > > it is better to add sequence operations as text to understand the flow. > > I prefer keeping the diagram here, it looks more clean and concise. > Block of text gives no new information and harder to follow, imho. > > > > > > + > > > +Enqueue creation operation > > > +~~~~~~~~~~~~~~~~~~~~~~~~~~ > > > + > > > +Enqueueing a flow rule creation operation is similar to simple creation. > > > > If it is enqueue operation, why not call it ad rte_flow_q_flow_enqueue() > > > > > + > > > +.. code-block:: c > > > + > > > + struct rte_flow * > > > + rte_flow_q_flow_create(uint16_t port_id, > > > + uint32_t queue_id, > > > + const struct rte_flow_q_ops_attr > > > *q_ops_attr, > > > + struct rte_flow_table *table, > > > + const struct rte_flow_item pattern[], > > > + uint8_t pattern_template_index, > > > + const struct rte_flow_action actions[], > > > > If I understand correctly, table is the pre-configured object that has > > N number of patterns and N number of actions. > > Why giving items[] and actions[] again? > > Table only contains templates for pattern and actions. > We still need to provide the values for those templates when we create a flow. > Thus we specify patterns and action here. > > > > + uint8_t actions_template_index, > > > + struct rte_flow_error *error); > > > + > > > +A valid handle in case of success is returned. It must be destroyed later > > > +by calling ``rte_flow_q_flow_destroy()`` even if the rule is rejected by > > HW. > > > + > > > +Enqueue destruction operation > > > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > > > Queue destruction operation. > > We are not destroying queue, we are enqueuing the flow destruction operation. > > > > > > + > > > +Enqueueing a flow rule destruction operation is similar to simple > > destruction. > > > + > > > +.. code-block:: c > > > + > > > + int > > > + rte_flow_q_flow_destroy(uint16_t port_id, > > > + uint32_t queue_id, > > > + const struct rte_flow_q_ops_attr > > > *q_ops_attr, > > > + struct rte_flow *flow, > > > + struct rte_flow_error *error); > > > + > > > +Push enqueued operations > > > +~~~~~~~~~~~~~~~~~~~~~~~~ > > > + > > > +Pushing all internally stored rules from a queue to the NIC. > > > + > > > +.. code-block:: c > > > + > > > + int > > > + rte_flow_q_push(uint16_t port_id, > > > + uint32_t queue_id, > > > + struct rte_flow_error *error); > > > + > > > +There is the postpone attribute in the queue operation attributes. > > > +When it is set, multiple operations can be bulked together and not sent > > > to > > HW > > > +right away to save SW/HW interactions and prioritize throughput over > > latency. > > > +The application must invoke this function to actually push all > > > outstanding > > > +operations to HW in this case. > > > + > > > +Pull enqueued operations > > > +~~~~~~~~~~~~~~~~~~~~~~~~ > > > + > > > +Pulling asynchronous operations results. > > > + > > > +The application must invoke this function in order to complete > > asynchronous > > > +flow rule operations and to receive flow rule operations statuses. > > > + > > > +.. code-block:: c > > > + > > > + int > > > + rte_flow_q_pull(uint16_t port_id, > > > + uint32_t queue_id, > > > + struct rte_flow_q_op_res res[], > > > + uint16_t n_res, > > > + struct rte_flow_error *error); > > > + > > > +Multiple outstanding operation results can be pulled simultaneously. > > > +User data may be provided during a flow creation/destruction in order > > > +to distinguish between multiple operations. User data is returned as part > > > +of the result to provide a method to detect which operation is completed. > > > + > > > +Enqueue indirect action creation operation > > > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > > + > > > +Asynchronous version of indirect action creation API. > > > + > > > +.. code-block:: c > > > + > > > + struct rte_flow_action_handle * > > > + rte_flow_q_action_handle_create(uint16_t port_id, > > > > What is the use case for this? > > Indirect action creation may take time, it may depend on hardware resources > allocation. So we add the asynchronous way of creating it the same way. > > > How application needs to use this. We already creating flow_table. Is > > that not sufficient? > > The indirect action object is used in flow rules via its handle. > This is an extension to the already existing API in order to speed up > the creation of these objects. > > > > > > + uint32_t queue_id, > > > + const struct rte_flow_q_ops_attr *q_ops_attr, > > > + const struct rte_flow_indir_action_conf > > > *indir_action_conf, > > > + const struct rte_flow_action *action, > > > + struct rte_flow_error *error); > > > + > > > +A valid handle in case of success is returned. It must be destroyed > > > later by > > > +calling ``rte_flow_q_action_handle_destroy()`` even if the rule is > > rejected. > > > + > > > +Enqueue indirect action destruction operation > > > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > > + > > > +Asynchronous version of indirect action destruction API. > > > + > > > +.. code-block:: c > > > + > > > + int > > > + rte_flow_q_action_handle_destroy(uint16_t port_id, > > > + uint32_t queue_id, > > > + const struct rte_flow_q_ops_attr *q_ops_attr, > > > + struct rte_flow_action_handle *action_handle, > > > + struct rte_flow_error *error); > > > + > > > +Enqueue indirect action update operation > > > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > > + > > > +Asynchronous version of indirect action update API. > > > + > > > +.. code-block:: c > > > + > > > + int > > > + rte_flow_q_action_handle_update(uint16_t port_id, > > > + uint32_t queue_id, > > > + const struct rte_flow_q_ops_attr *q_ops_attr, > > > + struct rte_flow_action_handle *action_handle, > > > + const void *update, > > > + struct rte_flow_error *error); > > > + > > > .. _flow_isolated_mode: > > > > > > Flow isolated mode > > > diff --git a/doc/guides/rel_notes/release_22_03.rst > > b/doc/guides/rel_notes/release_22_03.rst > > > index d23d1591df..80a85124e6 100644 > > > --- a/doc/guides/rel_notes/release_22_03.rst > > > +++ b/doc/guides/rel_notes/release_22_03.rst > > > @@ -67,6 +67,14 @@ New Features > > > ``rte_flow_table_destroy``, ``rte_flow_pattern_template_destroy`` > > > and ``rte_flow_actions_template_destroy``. > > > > > > +* ethdev: Added ``rte_flow_q_flow_create`` and > > ``rte_flow_q_flow_destroy`` API > > > + to enqueue flow creaion/destruction operations asynchronously as well > > as > > > + ``rte_flow_q_pull`` to poll and retrieve results of these operations > > > and > > > + ``rte_flow_q_push`` to push all the in-flight operations to the NIC. > > > + Introduced asynchronous API for indirect actions management as well: > > > + ``rte_flow_q_action_handle_create``, > > ``rte_flow_q_action_handle_destroy`` and > > > + ``rte_flow_q_action_handle_update``. > > > +